Methods for modification of target nucleic acids

ABSTRACT

Methods for modification of target nucleic acids. The method involves a construct in which guide RNA is covalently linked to donor RNA (fusion NA) to be introduced into the target nucleic acid by homologous recombination and is based on the introduction of a nuclease, e.g. CRISPR or TALEN, into the cell containing the target nucleic acid. The fusion NA may be introduced as a DNA vector.

This application is a continuation of Ser. No. 16/097,300, filed Oct.29, 2018, which is a National Stage application of InternationalApplication No. PCT/EP2017/059331, filed Apr. 20, 2017, which claims thebenefit of European Patent Application No. 16167773.7, filed Apr. 29,2016, European Patent Application No. 16167774.5, filed Apr. 29, 2016,and European Patent Application No. 17156018.8, filed Feb. 14, 2017.

INCORPORATION BY REFERENCE OF MATERIAL SUBMITTED ELECTRONICALLY

The Sequence Listing, which is a part of the present disclosure, issubmitted concurrently with the specification as an xml file (Name:“039621.00977-Sequence-Listing-ST26.xml” Size: 228,028 bytes; Created:Sep. 16, 2022). The subject matter of the Sequence Listing isincorporated herein in its entirety by reference.

FIELD OF THE INVENTION

The invention is directed to improved methods for modification of targetnucleic acids.

DESCRIPTION OF THE INVENTION

The CRISPR (clustered regularly interspaced short palindromic repeats)system was initially identified as an adaptive defense mechanisms ofbacteria belonging to the genus of Streptococcus (WO2007/025097). Thosebacterial CRISPR systems rely on guide RNA (gRNA) in complex withcleaving proteins to direct degradation of complementary sequencespresent within invading viral DNA. Cas9, the first identified protein ofthe CRISPR/Cas system, is a large monomeric DNA nuclease guided to a DNAtarget sequence adjacent to the PAM (protospacer adjacent motif)sequence motif by a complex of two noncoding RNAs: crRNA andtrans-activating crRNA (tracrRNA). Later, a synthetic RNA chimera(single guide RNA or gRNA) created by fusing crRNA with tracrRNA wasshown to be equally functional (Jinek et. al. 2012).

Several research groups have found that the CRISPR cutting propertiescould be used to disrupt genes in almost any organism's genome withunprecedented ease (Mali P, et al (2013) Science. 339(6121):819-823;Cong L, et al (2013) Science 339(6121)). Recently it became clear thatproviding a template for repair allowed for editing the genome withnearly any desired sequence at nearly any site, transforming CRISPR intoa powerful gene editing tool (WO/2014/150624, WO/2014/204728).

Gene targeting refers to site specific gene modification by nucleic aciddeletion, insertion or replacement via homologous recombination (HR).Targeting efficiency is highly promoted by a double-strand break (DSB)in the genomic target. Also, the direct presence of homology after DSBof chromosomal DNA seems to nearly eliminate nonhomologous end joining(NHEJ) repair in favor of homologous recombination.

The invention at hand provides a guide nucleic acid fused to theappropriate donor for HR repair which interacts with a nucleic acidmodifying polypeptide (fusion nucleic acid (fuNA) molecule comprising aguide nucleic acid (gNA) molecule covalently linked to at least onedonor nucleic acid (doNA) molecule). To improve delivery of donornucleic acid with homology to target DNA flanking a nuclease cuttingsite, a gene targeting strategy is presented where the donor nucleicacid (doNA) is covalently linked to the CRISPR components. In this way,the gene editing complex comprises not only the necessary recognitionand cutting tools but also the template for modification. Uponrecognition by the guide nucleic acid (gNA), the nuclease cleaves thetarget region, and the immediate synchronized presence of incoming donorwill facilitate the HR process, thereby increasing gene repair efficacy.

Many microbial systems lack an efficient NHEJ system [Standage-Beier K,Zhang Q, Wang X (2015) Targeted Large-Scale Deletion of BacterialGenomes Using CRISPR-Nickases. ACS Synth Biol. 4(11): 1217-1225]. E.g.Clostridium cellulolyticum which is important for bioenergy research,cannot be engineered easily with CRISPR/Cas9 [Xu T, Li Y, Shi Z, Hemme CL, Li Y, Zhu Y, Van Nostrand J D, He Z, Zhou J (2015) Efficient GenomeEditing in Clostridium cellulolyticum via CRISPR-Cas9 Nickase. ApplEnviron Microbiol. 81(13):4423-31] and attempts to knockout genes byintroducing DSBs result in cell death. Another example is Escherichiacoli which also relies on homologous recombination (HR) for DSB repair.Current techniques for genome editing in such organisms focus on the useof CRISPR/Cas9-based nickases with subsequent inefficient repair vianon-fused nucleic acid donors, inducible expression of CRISPR/Cas tocoordinate nuclease action and the introduction of nucleic acid donorsor recombinases which lack flexibility in addressing custom DNAsequences.

The application of the FusionCRISPR technology, which is the provisionand/or application of a CRISPR/Cas or CRISPR/Cas like system comprisinga fuNA molecule, will simplify the creation of knockouts in a widevariety of microbial species formerly not amenable to high throughputtargeting. By offering the donor nucleic acids promptly upon inductionof a double strand break (DSB), the HR repair machinery willsuccessfully ligate the break with accompanying template. Theintroduction of FusionCRISPR is performed with common techniquesotherwise used for the introduction of nicking versions of CRISPR/Cas9,recombinases or any other (trans)genes of interest. These techniquesinclude, but are not limited to electroporation/heat shock of plasmids,viral transduction and conjugation.

Whereas genes in organisms with efficient NHEJ can be knocked out byintroducing CRISPR/Cas9 in which the guide RNA consists of (1) spacermatching the target gene (2) essential sequences for correct guide RNAfolding (tracrRNA:crRNA commonly combined in one single guide RNA(sgRNA)) with a typical length of ˜99 nucleotides, knockouts inDSB-sensitive microbes can be achieved through a simple adaptation ofthe RNA resulting in the following composition: (1) spacer (2) essentialsequences for correct secondary structure (3) at least 15 nucleotidesmatching the target (4) at least 15 nucleotides matching the target oneor more bases downstream from the first at least 15 nucleotides matchingthe target. Two complementary ssDNA oligonucleotides representing aknockout FusionCRISPR cassette can readily be purchased commerciallyfrom any oligo synthesis company. Alternatively, it can be synthesizedas dsDNA. Cloning of FusionCRISPR knockout cassettes can proceed similarto the most common method for cloning of 20 nucleotides spacers for anintended target in regular CRISPR cassettes with type IIS restrictionenzymes. Thus, access, flexibility and ease of use of controlled geneknockouts through FusionCRISPR is comparable to regular CRISPR/Cas9 andthe current invention expands high throughput knockouts to targetspecies in which NHEJ is inefficient or absent.

FusionCRISPR can also be used for knock-ins in any organism. Aselectable marker can be knocked in using standard introductiontechniques and standard expression vehicles, including simple plasmids.This can be done to interrupt and knockout an endogenous gene whileproviding easy selection.

Genome-scale knockouts in human and other cells allows the discovery ofgenes involved in diseases, drug response and normal biologicalprocesses and the creation of disease models [Shalem O, Sanjana N E,Hartenian E, Shi X, Scott D A, Mikkelsen T S, Heckl D, Ebert B L, Root DE, Doench J G, Zhang F (2014) Genome-scale CRISPR-Cas9 knockoutscreening in human cells. Science; 343(6166):84-7]. Knockouts aretypically generated by introduction of CRISPR/Cas9 for a specific targetand relying on NHEJ to create a “null” for the gene in question. Threemajor problems arise with this method: (1) Knockouts are often notcreated because NHEJ repair does not result in a frame shift (oralternative downstream start codons are being used resulting in atruncated gene product), (2) The outcome, i.e. the exact primary DNAsequence, is not known and needs to be determined for each DNA modifying“event” which is costly and time-consuming, (3) in diploid or polyploidorganisms the NHEJ repair on each chromosome will occur differently, ifsuccessful targeting on all available “substrates” is even achieved inthe first place, resulting in complex molecular analysis and/or a forcedswitch to haploid model lines [Wade M (2015) High-throughput silencingusing the CRISPR-Cas9 system: A review of the benefits and challenges.Journal of Biomolecular Screening, Vol. 20(8):1027-39]. FusionCRISPRallows control of the deletion as described above, which leads to apredictable outcome. In diploid and polyploid cell systems anyintroduced modification on each of the chromosomes will for the majoritybe identical. The deletion can be designed to be large enough so thatalternative transcription possibilities are reduced and by designing thedeletion to consist of a number of bases that cannot be divided bythree, the risk of still creating a functional protein which merelylacks a short stretch of amino acids will be eliminated.

Methods for the introduction of FusionCRISPR configurations designed tocontrol knockouts in any cell system are identical to methods used forthe introduction of CRISPR/Cas9 which is followed by NHEJ repair forcreation of knockouts. This includes but is not limited to the use ofAAV and lentiviral vectors for human cells to baculovirus expressionsystem for insect cells. No further adaptation to current methods isrequired other than a short extension of the guideRNA as describedabove.

Genome editing can be used to treat a variety of genetic disorders. Manygenetic disorders involve point mutations which could potentially becorrected by providing a site specific nuclease along with a correctivenucleic acid template containing the required correction(s). One exampleis sickle cell disease which results from a single DNA base mutation(A>T) in the sixth codon of the β-globin gene [Li C, Ding L, Sun C W, WuL C, Zhou D, Pawlik K M, Khodadadi-Jamayran A, Westin E, Goldman F D,Townes T M (2016) Novel HDAd/EBV Reprogramming Vector and HighlyEfficient Ad/CRISPR-Cas Sickle Cell Disease Gene Correction. Sci Rep.6:30422]. Correction of mutations in cells by providing nuclease andcorrective template separately will often result in uncoordinatedinitial break in the DNA and local arrival of the corrective templatefor HDR repair. To compensate for the lack of temporal/spatialcoordination nuclease and corrective template concentrations need to berelatively high. Higher nuclease concentrations can lead to higheroff-target cleavage with negative consequences (higher patient riskand/or higher costs in molecular analysis). FusionCRISPR will achievecorrect gene correction at much lower concentration. Methods forapplying FusionCRISPR are identical to current standards in the field[Maeder M L, Gersbach C A (2016) Genome-editing Technologies for Geneand Cell Therapy. Mol Ther. 24(3):430-46] with the sole difference beinga slightly longer sgRNA encompassing the desired correction.

FusionCRISPR can be used to alter substrate or product specificity of avariety of enzymes in a variety of different organisms. For example, asingle amino acid has been found to be a major determinant in triterpenesynthases substrate and product specificity in various plant species[Salmon M, Thimmappa R B, Minto R E, Melton R E, Hughes R K, O'Maille PE, Hemmings A M, Osbourn A (2016) A conserved amino acid residuecritical for product and substrate specificity in plant triterpenesynthases. Proc Natl Acad Sci USA. 113(30):E4407-14]. Triterpenes are adiverse group of natural products with applications in pharmacy andbiotechnology and the ability to influence product specificity opens thedoor to synthesis of novel or higher quantity biomolecules in a plant ofinterest. FusionCRISPR allows the modification of critical amino acidsin substrate pockets or other domains of an enzyme that influencespecificity. The desired amino acid is introduced by including thecorresponding codon in the fused template flanked by nucleotidesequences that have homology with the sequences flanking the codon thatneeds to be replaced in the genome. One example of introducing theFusionCRISPR construct is by Agrobacterium-mediated T-DNA transformationin which the T-DNA contains a selectable marker, Cas9 driven by a strongpromoter and the FusionCRISPR construct consisting of respectively: (1)an RNA polymerase 11 promoter, (2) spacer matching the intended targetcodon or a sequence adjacent to it (find the nearest PAM), (3) essentialsequences for correct secondary structure of the guide, (4) homology armmatching one region flanking the break, (5) the novel codon(s) matchingthe amino acid(s) that affect specificity in the enzyme, (6) thehomology arm on the other side of the break and (7) a terminator.Depending on where the cut is made, a silent mutation may have to beincluded to avoid having a PAM sequence in the donor. Homology arms,intended changes and sequences between intended changes will becontiguous for seamless incorporation of the donor. The selectablemarker is used for obtaining plant cells that have stably integratedT-DNA. In these cells, throughout maintenance and regeneration oftransformed plants, the FusionCRISPR components have the opportunity toalter the genome at the intended target. These alterations can maketheir way into the germline and the next generation of seedlings can bescreened for the novel endogenous sequence and segregation of the T-DNA.

FusionCRISPR allows the insertion of epitope or other tags at endogenousgenes in any organism. The tags can be used for tracking of endogenousgene products at the cellular and subcellular level, identification ofprotein-protein interactions, ChIP and other molecular interactions andprotein purification. Most applications will be in pathway discovery andidentification of drug targets. The FusionCRISPR construct will have thenucleic acids encoding for the tags as payload between the stretches ofhomology as similar described above for introduction of substitutions.Introduction of FusionCRISPR either as DNA or transiently will followthe same protocols as currently used for regular CRISPR/Cas9 in theorganisms of interest.

Our yeast experiments have shown that the payload in FusionCRISPR can beat least 731 nucleotides without affecting the full function of thenuclease. A sequence of this or potentially larger size can comprise astrong promoter or promoter with different tissue or environmental cuespecific activity compared to the natural promoter. FusionCRISPR canpotentially displace the endogenous promoter and allow an alternativemethod of gene regulation including upregulation.

It is one objective of the invention at hand to simplify the applicationof the CRISPR/Cas DNA repair system or CRISPR/Cas like DNA repairsystem.

It is a further objective of the invention at hand to enhance theefficiency of homologous recombination in a target nucleic acid duringDNA break repair.

Surprisingly this was achieved by covalently linking a donor nucleicacid molecule and a guide nucleic acid molecule the latter interactingwith a site directed nucleic acid modifying polypeptide as elements of aCRISPR/Cas DNA repair system or a CRISPR/Cas like DNA repair system.

DETAILED DESCRIPTION OF THE INVENTION

One embodiment of the invention at hand is a method for modification ofa target nucleic acid (target NA) molecule in a cell or compositioncomprising the steps of

-   -   a. providing a recombinant fusion nucleic acid (fuNA) molecule        comprising a guide nucleic acid (gNA) molecule covalently linked        to at least one donor nucleic acid (doNA) molecule, and    -   b. introducing said fuNA molecule into one or more cells or        composition comprising the target NA molecule, and    -   c. introducing a site directed nucleic acid modifying        polypeptide into said one or more cells or composition, and    -   d. incubating the one or more cells or composition under        conditions that allow for homologous recombination in said one        or more cells or composition, and optionally    -   e. isolating one or more cells in which homologous recombination        occurred.

Various preferred structures of the fusion nucleic acid moleculeaccording to the invention are depicted in FIG. 1 to 12 . The mostpreferred structure is depicted in FIG. 1 .

The target nucleic acid may be modified by introducing a nucleic acidmolecule into the target nucleic acid wherein the introduced nucleicacid molecule is heterologous to the target nucleic acid. The sequencebetween homology arm 1 and homology arm 2 of the donor nucleic acidmolecule would in this case comprise the nucleic acid molecule which issupposed to be introduced into the target nucleic acid and which is notpresent between the regions in the target nucleic acid that arecomplementary to the homology arms.

The target nucleic acid may also be modified by deleting at least onebase from the target nucleic acid. In that case, the sequence betweenhomology arm 1 and homology arm 2 of the donor nucleic acid moleculewould comprise a sequence complementary to the target nucleic acidmolecule that lacks at least one base compared to the target nucleicacid.

The target nucleic acid may further be modified by replacing at leastone base in the target nucleic acid with one or more bases heterologousto the target nucleic acid. In that case the sequence between homologyarm 1 and 2 of the donor nucleic acid would comprise at least onemismatch compared to the complementary region in the target nucleicacid.

The target nucleic acid may comprise a “protospacer adjacent motif”(PAM) sequence adjacent to the targeted sequence in the target nucleicacid molecule which is required for some site directed nucleic acidmodifying polypeptides for correct target site identification andbinding. The sequence of the PAM is specific for the various sitedirected nucleic acid modifying polypeptides (Doudna and Charpentier,2014, Science 346 (6213):1258096) and is known to the skilled person.

The method of the invention is preferably applied for target nucleicacid modification in living cells but may also be applied in in vitrosystems.

The target nucleic acid molecule may be RNA or DNA, it may be single- ordoublestranded. Preferably, the target nucleic acid molecule is DNA,more preferable the target nucleic acid molecule is double-stranded DNA.

The site directed nucleic acid modifying polypeptide may be introducedinto the cell or composition as a polypeptide or may be introduced byintroduction of an RNA molecule encoding said site directed nucleic acidmodifying polypeptide or by introduction of an expression constructexpressing said site directed nucleic acid modifying polypeptide whereinthe expression construct is comprising a promoter functional in therespective cell or composition functionally linked to a gene encodingsaid site directed nucleic acid modifying polypeptide. Examples for suchsite directed nucleic acid modifying polypeptides are shown in Table 1.Further, any functional equivalent to such site directed nucleic acidmodifying polypeptide may be used in the method of the invention.

TABLE 1 Examples for site directed nucleic acid modifying polypeptidesGenBank Acc No. Bacterium 303229466 Veillonella atypica ACS-134-V-Col7a34762592 Fusobacterium nucleatum subsp. vincentii 374307738 Filifactoralocis ATCC 35896 320528778 Solobacterium moorei F0204 291520705Coprococcus catus GD-7 42525843 Treponema denticola ATCC 35405 304438954Peptoniphilus duerdenii ATCC BAA-1640 224543312 Catenibacteriummitsuokai DSM 15897 24379809 Streptococcus mutans UA159 15675041Streptococcus pyogenes SF370 16801805 Listeria innocua Clip11262116628213 Streptococcus thermophilus LMD-9 323463801 Staphylococcuspseudintermedius ED99 352684361 Acidaminococcus intestini RyC-MR95302336020 Olsenella uli DSM 7084 366983953 Oenococcus kitaharae DSM17330 310286728 Bifidobacterium bifidum S17 258509199 Lactobacillusrhamnosus GG 300361537 Lactobacillus gasseri JV-V03 169823755 Finegoldiamagna ATCC 29328 47458868 Mycoplasma mobile 163K 284931710 Mycoplasmagallisepticum str. F 363542550 Mycoplasma ovipneumoniae SC01 384393286Mycoplasma canis PG 14 71894592 Mycoplasma synoviae 53 238924075Eubacterium rectale ATCC 33656 116627542 Streptococcus thermophilusLMD-9 315149830 Enterococcus faecalis TX0012 315659848 Staphylococcuslugdunensis M23590 160915782 Eubacterium dolichum DSM 3991 336393381Lactobacillus coryniformis subsp. torquens 310780384 Ilyobacterpolytropus DSM 2926 325677756 Ruminococcus albus 8 187736489 Akkermansiamuciniphila ATCC BAA-835 117929158 Acidothermus cellulolyticus 11B189440764 Bifidobacterium longum DJO10A 283456135 Bifidobacteriumdentium Bd1 38232678 Corynebacterium diphtheriae NCTC 13129 187250660Elusimicrobium minutum Pei191 319957206 Nitratifractor salsuginis DSM16511 325972003 Sphaerochaeta globus str. Buddy 261414553 Fibrobactersuccinogenes subsp. succinogenes 60683389 Bacteroides fragilis NCTC 9343256819408 Capnocytophaga ochracea DSM 7271 90425961 Rhodopseudomonaspalustris BisB18 373501184 Prevotella micans F0438 294674019 Prevotellaruminicola 23 365959402 Flavobacterium columnare ATCC 49512 312879015Aminomonas paucivorans DSM 12260 83591793 Rhodospirillum rubrum ATCC11170 294086111 Candidates Puniceispirillum marinum IMCC1322 121608211Verminephrobacter eiseniae EF01-2 344171927 Ralstonia syzygii R24159042956 Dinoroseobacter shibae DFL 12 288957741 Azospirillum sp- B51092109262 Nitrobacter hamburgensis X14 148255343 Bradyrhizobium sp- BTAi134557790 Wolinella succinogenes DSM 1740 218563121 Campylobacter jejunisubsp. jejuni 291276265 Helicobacter mustelae 12198 229113166 Bacilluscereus Rock1-15 222109285 Acidovorax ebreus TPSY 189485225 unculturedTermite group 1 182624245 Clostridium perfringens D str. 220930482Clostridium cellulolyticum H10 154250555 Parvibaculum lavamentivoransDS-1 257413184 Roseburia intestinalis L1-82 218767588 Neisseriameningitidis Z2491 15602992 Pasteurella multocida subsp. multocida319941583 Sutterella wadsworthensis 3 1 254447899 gamma proteobacteriumHTCC5015 54296138 Legionella pneumophila str. Paris 331001027Parasutterella excrementihominis YIT 11859 34557932 Wolinellasuccinogenes DSM 1740 118497352 Francisella novicida U112 961512549Francisella tularensis subsp. novicida U112 961512548 Acidaminococcussp. BV3L6

The site directed nucleic acid modifying polypeptide may have adouble-stranded nucleic acid digestion function or it may have a nickasefunction, cutting only one strand of a double-stranded nucleic acidmolecule. The nucleic acid restriction or nickase capability of the sitedirected nucleic acid modifying polypeptide may also be inactivated andthe recombinant site directed nucleic acid modifying polypeptide may belinked to other functional groups such as the DNA restriction region ofFokI or of a homing endonulcease. Such recombinant site directed nucleicacid modifying polypeptides are for example described in Tsai et al(2014; Nat Biotechnol. 2014 32(6):569-76.) or Guilinger et al (2014; NatBiotechnol. 2014 32(6):577-82).

The gNA molecule comprises a spacer nucleic acid (spacer NA) moleculecomprising at least 12 bases 100% complementary to the target NAmolecule. Preferably it comprises at least 13 bases, at least 14 basesor at least 15 bases complementary to the target NA molecule. Morepreferably it comprises at least 16 bases, at least 17 bases or at least18 bases complementary to the target NA molecule. Even more preferablyit comprises at least 19 bases or at least 20 bases complementary to thetarget NA.

The gNA molecule further comprises a scaffold nucleic acid (scaffold NA)molecule. The scaffold NA may consist of one nucleic acid molecule,which comprises two regions each comprising at least eight bases beingcomplementary to each other, capable to hybridize and form a hairpinstructure. The scaffold NA may consist of two nucleic acid moleculeseach comprising at least one region of at least eight basescomplementary to each other, capable to hybridize and form adouble-stranded structure. If said regions are comprising more thaneight complementary bases, each region comprises at least eight basesthat are complementary to at least eight bases of the other region.Preferably the scaffold NA consists of one molecule.

The scaffold NA molecule is covalently linked to the spacer NA molecule.In the event, the scaffold NA molecule consists of two independentmolecules, at least one of these molecules of the scaffold NA iscovalently linked to the spacer NA molecule. In addition to the tworegions comprising at least eight bases being complementary to eachother, the scaffold NA molecule comprises a further region forming asecondary structure comprising at least one hairpin, preferably at leasttwo hairpins.

The donor NA molecule comprises two homology arms. Each homology arm ofthe donor NA molecule comprises at least 15 bases and is at least 5%,preferably at least 10%, more preferably at least 15%, most preferablyat least 20% of the size of the additional NA region spacing thehomology arms. Homology arm 1 and 2 may have the same length ordifferent length.

The homology arms each comprise at least 15 bases that are 100%complementary to the same number of consecutive bases in target NAmolecule. In the event, a homology arm is larger than 15 bases it ispreferably at least 60%, preferably at least 70%, more preferably atleast 75%, more preferably 80%, more preferably at least 85%, morepreferably at least 90%, even more preferably at least 95%, even morepreferably at least 98%, even more preferably at least 99% complementaryto the target NA molecule. Most preferably each homology arm is 100%complementary to the target NA molecule.

The method of the invention may be applied in cells or compositionscomprising the target NA molecule. Preferably the method is applied tocells, wherein the cell is a microbial, animal, human or plant cell,more preferably the method is applied to yeast or plant cells.

The methods of the invention may be applied to any plant cell, forexample gymnosperm or angiosperm, preferably angiosperm, for exampledicotyledonous or monocotyledonous plant cells. Preferredmonocotyledonous plant cells are for example corn, wheat, rice, barley,sorghum, musa, sugarcane, miscanthus and brachypodium, especiallypreferred monocotyledonous plant cells are corn, wheat and rice.Preferred dicotyledonous plant cells are for example soy, rape seed,canola, linseed, cotton, potato, sugar beet, tagetes and Arabidopsis,especially preferred dicotyledonous plant cells are soy, rape seed,canola and potato.

The method of the invention may also be applied to any microorganism.The microorganism may be a bacteria, the bacterial cell may be anygram-positive bacterium or a gram negative bacterium. Gram positivebacteria include, but not limited to, Bacillus, Brevibacterium,Corynebacterium, Streptococcus, Streptomyces, Staphylococcus,Enterococcus, Lactobacillus, Lactococcus, Clostridium, Geobacillus, andOceanobacillus. Gram negative bacteria include, but not limited to, E.coli, Pseudomonas, Salmonella, Campylobacter, Helicobacter, Acetobacter,Flavobacterium, Fusobacterium, Gluconobacter. Preferably, the gramnegative cell is an E. coli cell.

In the methods of the present invention, the bacterial cell may be anyBacillus cell. Bacillus cells useful in the practice of the presentinvention include, but are not limited to, Bacillus alkalophilus,Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans,Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus,Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacilluspumilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillusthuringiensis cells. In a preferred aspect, the bacterial cell is aBacillus amyloliquefaciens, Bacillus lentus, Bacillus licheniformis,Bacillus stearothermophilus or Bacillus subtilis cell. In a morepreferred aspect, the bacterial cell is a Bacillus licheniformis cell ora Bacillus subtilis cell, preferably a Bacillus subtilis cell.

In the methods of the present invention, the bacterial host cell may beLactobacillus acidophilus Lactobacillus plantarum, Lactobacillusgasseri, Lactobacillus bulgaricusk, Lactobacillus reuteri,Staphylococcus aureus, Corynebacterium, particularly the speciesCorynebacterium glutamicum, Corynebacterium acetoglutamicum,Corynebacterium acetoacidophilum, Corynebacterium callunae,Corynebacterium ammoniagenes, Corynebacterium thermoaminogenes,Corynebacterium melassecola and Corynebacterium effiziens,Corynebacterium efficiens, Corynebacterium deserti, Brevibacteriumflavum, Brevibacterium lactofermentum, Brevibacterium divarecatum,Pseudomonas putida, Pseudomonas syringae, Streptoymyces, particularlythe species Streptomyces coelicolor, Streptomyces lividans, Streptomycesalbus, Streptomyces avermitilis, Gluconobacter oxydans, Gluconobactermorbifer, Gluconobacter thailandicus, Acetobacter aceti, Clostridiumacetobutylicum, Clostridium saccharobutylicum, Clostridium beijerinckii,Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis,and Streptococcus equi subsp. Zooepidemicus. Another preferred bacteriais Basfia succiniciproducens.

The microorganism may be a eukaryotic cell. Suitable eukaryotic cellsinclude yeast cells, as for example Saccharomyces spec, such asSaccharomyces cerevisiae, Hansenula spec, such as Hansenula polymorpha,Schizosaccharomyces spec, such as Schizosaccharomyces pombe,Kluyveromyces spec, such as Kluyveromyces lactis and Kluyveromycesmarxianus, Yarrowia spec, such as Yarrowia lipolytica, Pichia spec, suchas Pichia methanolica, Pichia stipites and Pichia pastoris,Zygosaccharomyces spec, such as Zygosaccharomyces rouxii andZygosaccharomyces bailii, Candida spec, such as Candida boidinii,Candida utilis, Candida freyschussii, Candida glabrata and Candidasonorensis, Schwanniomyces spec, such as Schwanniomyces occidentalis,Arxula spec, such as Arxula adeninivorans, Ogataea spec such as Ogataeaminuta, Klebsiella spec, such as Klebsiella pneumonia.

The target nucleic acid molecule may be endogenous to the cell or it maybe heterologous to the cell, such as, for example a transgene or a viralnucleic acid molecule.

The doNA molecule and the gNA molecule are covalently bound to eachother forming a fusion nucleic acid (fusion NA) molecule. The donor NAmolecule may be covalently bound to the spacer NA part of the gNAmolecule or to the scaffold NA part of the gNA molecule. In a preferredembodiment, the donor NA is covalently linked to the scaffold NA part ofthe gNA molecule.

Most preferably the fusion NA molecule is one molecule, preferably onecontinuous RNA molecule, wherein all elements (gNA, scaffold NA anddoNA) are covalently linked.

The doNA molecule and guide NA molecule may consist of RNA, DNA, PNA.Preferably they consist of RNA or DNA. More preferably the doNA moleculeconsists of DNA and the guide NA molecule consists of RNA.

In a most preferred embodiment, both guide and donor NA consist of RNA,wherein at least the doNA and gNA are covalently linked to each otherforming a fusion ribonucleic acid molecule (fuRNA).

The fuRNA molecule may be introduced into the cell or compositioncomprising the target NA molecule as an RNA molecule or as one or moreexpression constructs encoding said fuRNA molecules.

In another embodiment the doNA molecule and gNA molecule may consist ofDNA, wherein the doNA and gNA are covalently linked to each otherforming a fusion deoxyribonucleic acid molecule (fuDNA).

The fuDNA molecule may be introduced into the cell or compositioncomprising the target NA molecule as a DNA molecule by various methodsas for example transfection, biolistics, electroporation, photoporation,whiskers, sonication, nanobodies or microfluids. It may also beintroduced using an agrobacterium as a vehicle which is capable oftransferring a T-DNA molecule into a cell but which is not capable tomediate the integration of said T-DNA molecule into the genomic DNA ofthe target cell. The T-DNA would comprise or consist of the fuDNAmolecule.

A further embodiment of the invention is a recombinant fuNA molecule,for example a fuRNA or fuDNA molecule, comprising a doNA moleculecovalently linked to a gNA molecule.

Another embodiment of the invention is a vector comprising an expressionconstruct comprising a promoter functionally linked to a DNA moleculeencoding the fuNA molecule of the invention.

A further embodiment of the invention is a cell comprising a fuNAmolecule of the invention and a nucleic acid modifying polypeptide.

A further embodiment of the invention is a vector system comprising

-   -   a. a first vector comprising an expression construct comprising        a promoter functionally linked to a DNA molecule encoding the        fuNA molecule of the invention and    -   b. a second vector encoding a site directed nucleic acid        modifying polypeptide and optionally    -   c. a third vector encoding one part of the scaffold NA molecule.

In a preferred embodiment, the vector under a. comprises an expressionconstruct encoding a fuNA molecule comprising a spacerNA, scaffold NAand doNA.

The vector under c. is necessary and part of the vector system of theinvention if the scaffold NA is consisting of two molecules and if thevector under a. is encoding a fusion NA molecule comprising only onemolecule of the scaffold NA molecule and is not encoding the secondmolecule of the scaffold NA molecule.

A system for modification of a target NA in a cell comprising

-   -   A. a first vector comprising an expression construct comprising        a promoter functionally linked to a DNA molecule encoding the        fuNA molecule of the invention and    -   B. a second vector encoding a site directed nucleic acid        modifying polypeptide and    -   C. a cell comprising a target NA molecule and optionally    -   D. a third vector encoding one part of the scaffold NA molecule.

is another embodiment of the invention. In a preferred embodiment, thevector under A. comprises an expression construct encoding a fuNAmolecule comprising a spacerNA, scaffold NA and doNA. The vector underD. is necessary and part of the system of the invention if the scaffoldNA is consisting of two molecules and if the vector under A. is encodinga fusion NA molecule comprising only one molecule of the scaffold NAmolecule and is not encoding the second molecule of the scaffold NAmolecule.

Another embodiment of the invention is a composition comprising

-   -   a. a first vector comprising an expression construct comprising        a promoter functionally linked to a DNA molecule encoding the        fuNA molecule of the invention and    -   b. a second vector encoding a site directed nucleic acid        modifying polypeptide and    -   c. a cell comprising a target NA molecule and optionally    -   d. a third vector encoding one part of the scaffold NA molecule.

In a preferred embodiment, the vector under a. comprises an expressionconstruct encoding a fuNA molecule comprising a spacerNA, scaffold NAand doNA. The vector under d. is necessary and part of the compositionof the invention if the scaffold NA is consisting of two molecules andif the vector under a. is encoding a fusion NA molecule comprising onlyone molecule of the scaffold NA molecule and is not encoding the secondmolecule of the scaffold NA molecule.

The use of the vector of the invention, the vector system of theinvention, the system of the invention and/or the composition of theinvention for modification of a target NA molecule in a cell orcomposition is also an embodiment of the invention.

Definitions

It is to be understood that this invention is not limited to theparticular methodology or protocols. It is also to be understood thatthe terminology used herein is for the purpose of describing particularembodiments only, and is not intended to limit the scope of the presentinvention which will be limited only by the appended claims. It must benoted that as used herein and in the appended claims, the singular forms“a,” “and,” and “the” include plural reference unless the contextclearly dictates otherwise. Thus, for example, reference to “a vector”is a reference to one or more vectors and includes equivalents thereofknown to those skilled in the art, and so forth. The term “about” isused herein to mean approximately, roughly, around, or in the region of.When the term “about” is used in conjunction with a numerical range, itmodifies that range by extending the boundaries above and below thenumerical values set forth. In general, the term “about” is used hereinto modify a numerical value above and below the stated value by avariance of 20 percent, preferably 10 percent up or down (higher orlower). As used herein, the word “or” means any one member of aparticular list and also includes any combination of members of thatlist. The words “comprise,” “comprising,” “include,” “including,” and“includes” when used in this specification and in the following claimsare intended to specify the presence of one or more stated features,integers, components, or steps, but they do not preclude the presence oraddition of one or more other features, integers, components, steps, orgroups thereof. For clarity, certain terms used in the specification aredefined and used as follows:

Donor NA: the term “donor NA” or “doNA” means a nucleic acid comprisingtwo homology arms each comprising at least 15 bases complementary to twodifferent areas of at least 15 consecutive bases of the target NA,wherein said two homology arms are directly adjacent to each other orare separated by one or more additional bases. The two different areasof the target NA to which the homology arms are complementary may bedirectly adjacent to each other or may be separated by additional basesof up to 20 kb, preferably up to 10 kb, preferably up to 5 kb, morepreferably up to 3 kb, more preferably up to 2.5 kb, more preferably upto 2 kb.

In the event a homology arm comprises more than 15 bases, it may be 100%complementary to the target NA or it may be at least 75% complementary,preferably at least 80% complementary, more preferably at least 85%complementary, more preferably at least 90% complementary, morepreferably at least 95% complementary, more preferably at least 98%complementary to the target NA, wherein the homology arm comprises atleast one stretch of at least 15 bases that are 100% complementary to astretch of the same number of consecutive bases in the target NA,preferably the homology arm comprises at least one stretch of at least18 bases that are 100% complementary to a stretch of the same number ofconsecutive bases in the target NA, more preferably the homology armcomprises at least one stretch of at least 20 bases that are 100%complementary to a stretch of the same number of consecutive bases inthe target NA, even more preferably the homology arm comprises at leastone stretch of at least 25 bases that are 100% complementary to astretch of the same number of consecutive bases in the target NA, evenmore preferably the homology arm comprises at least one stretch of atleast 50 bases that are 100% complementary to a stretch of the samenumber of consecutive bases in the target NA.

The homology arms may have the same length and/or the same degree ofcomplementarity to the target NA or may have different length and/ordifferent degrees of complementarity to the target NA.

The homology arms may be directly adjacent to each other or may beseparated by a nucleic acid molecule comprising at least one base notpresent between the regions in the target nucleic acid complementary tothe homology arms.

Spacer NA: the term “spacer nucleic acid” or “spacer NA” means a nucleicacid comprising at least 12 bases 100% complementary to the target NA.

In the event the spacer NA comprises more than 12 bases, it may be atleast 75% complementary to the target NA, preferably at least 80%complementary, more preferably at least 85% complementary, morepreferably at least 90% complementary, more preferably at least 95%complementary, more preferably at least 98% complementary mostpreferably it is 100% complementary to the target NA, wherein the spacerNA comprises at least one stretch of at least 12 bases that are 100%complementary to a stretch of the same number of consecutive bases inthe target NA, preferably the spacer NA comprises at least one stretchof at least 15 bases that are 100% complementary to a stretch of thesame number of consecutive bases in the target NA, preferably the spacerNA comprises at least one stretch of at least 18 bases that are 100%complementary to a stretch of the same number of consecutive bases inthe target NA, more preferably the spacer NA comprises at least onestretch of at least 20 bases that are 100% complementary to a stretch ofthe same number of consecutive bases in the target NA, even morepreferably the spacer NA comprises at least one stretch of at least 25bases that are 100% complementary to a stretch of the same number ofconsecutive bases in the target NA, even more preferably the spacer NAcomprises at least one stretch of at least 50 bases that are 100%complementary to a stretch of the same number of consecutive bases inthe target NA.

The spacer NA is covalently linked to a scaffold NA. If the scaffold NAis consisting of two nucleic acid molecules, the spacer is covalentlylinked to one molecule of a scaffold NA.

Scaffold NA: the scaffold nucleic acid or scaffold NA comprises anucleic acid forming a secondary structure comprising at least onehairpin, preferably at least two hairpins and/or a sequence that is/arebound by the site directed nucleic acid modifying polypeptide. Such sitedirected nucleic acid modifying polypeptides are known in the art, forexample in WO/2014/150624; WO/2014/204728. The scaffold NA furthercomprises two regions each comprising at least eight bases beingcomplementary to each other, hence capable to hybridize forming adouble-stranded structure. If said regions of at least eight basescomplementary to each other are comprising more than eight bases, eachregion comprises at least eight bases that are complementary to at leasteight bases of the other region.

The two complementary regions of the scaffold NA may be covalentlylinked to each other via a linker molecule forming a hairpin structureor may consist of two independent nucleic acid molecules.

Guide NA: the guide nucleic acid or guide NA or gNA comprises a spacernucleic acid and a scaffold nucleic acid wherein the spacer NA and thescaffold NA are covalently linked to each other. In the event thescaffold NA consists of two molecules, the spacer NA is covalentlylinked to one molecule of the scaffold NA whereas the other molecule ofthe scaffold NA molecule hybridizes to the first scaffold NA molecule.Hence, a guide NA molecule may consist of one nucleic acid molecule ormay consist of two nucleic acid molecules. Preferably the guide NAconsists of one molecule.

Fusion NA: the fusion nucleic acid comprises donor NA and guide NA,wherein the guide NA and the donor NA are covalently linked to eachother.

Site directed nucleic acid modifying polypeptide: By “site directednucleic acid modifying polypeptide” “nucleic acid-binding site directednucleic acid modifying polypeptide” or “site directed polypeptide” it ismeant a polypeptide that binds nucleic acids and is targeted to aspecific nucleic acid sequence. A site-directed nucleic acid modifyingpolypeptide as described herein is targeted to a specific nucleic acidsequence in the target nucleic acid either by mechanism intrinsic to thepolypeptide or, preferably by the nucleic acid molecule to which it isbound. The nucleic acid molecule bound by the polypeptide comprises asequence that is complementary to a target sequence within the targetnucleic acid, thus targeting the bound polypeptide to a specificlocation within the target nucleic acid (the target sequence).

Most site directed nucleic acid modifying polypeptides introduce dsDNAbreaks, but they may be modified to have only nicking activity or thenuclease activity may be inactivated. The site directed nucleic acidmodifying polypeptides may be bound to a further polypeptide having anactivity such as fluorescence or nuclease activity such as the nucleaseactivity of the FokI polypeptide or a homing endonuclease polypeptidesuch as I-Scel.

Coding region: As used herein the term “coding region” when used inreference to a structural gene refers to the nucleotide sequences whichencode the amino acids found in the nascent polypeptide as a result oftranslation of a mRNA molecule. The coding region is bounded, ineukaryotes, on the 5′-side by the nucleotide triplet “ATG” which encodesthe initiator methionine, prokaryotes also use the triplets “GTG” and“TTG” as start codon. On the 3′-side it is bounded by one of the threetriplets which specify stop codons (i.e., TAA, TAG, TGA). In addition agene may include sequences located on both the 5′- and 3′-end of thesequences which are present on the RNA transcript. These sequences arereferred to as “flanking” sequences or regions (these flanking sequencesare located 5′ or 3′ to the non-translated sequences present on the mRNAtranscript). The 5′-flanking region may contain regulatory sequencessuch as promoters and enhancers which control or influence thetranscription of the gene. The 3′-flanking region may contain sequenceswhich direct the termination of transcription, posttranscriptionalcleavage and polyadenylation.

Complementary: “Complementary” or “complementarity” refers to twonucleotide sequences which comprise antiparallel nucleotide sequencescapable of pairing with one another (by the base-pairing rules) uponformation of hydrogen bonds between the complementary base residues inthe antiparallel nucleotide sequences. For example, the sequence5′-AGT-3′ is complementary to the sequence 5′-ACT-3′. Complementaritycan be “partial” or “total.” “Partial” complementarity is where one ormore nucleic acid bases are not matched according to the base pairingrules. “Total” or “complete” complementarity between nucleic acidmolecules is where each and every nucleic acid base is matched withanother base under the base pairing rules. The degree of complementaritybetween nucleic acid molecule strands has significant effects on theefficiency and strength of hybridization between nucleic acid moleculestrands. A “complement” of a nucleic acid sequence as used herein refersto a nucleotide sequence whose nucleic acid molecules show totalcomplementarity to the nucleic acid molecules of the nucleic acidsequence.

Endogenous: An “endogenous” nucleotide sequence refers to a nucleotidesequence, which is present in the genome of a wild type microorganism.

Enhanced expression: “enhance” or “increase” the expression of a nucleicacid molecule in a microorganism are used equivalently herein and meanthat the level of expression of a nucleic acid molecule in amicroorganism is higher compared to a reference microorganism, forexample a wild type. The terms “enhanced” or “increased” as used hereinmean herein higher, preferably significantly higher expression of thenucleic acid molecule to be expressed. As used herein, an “enhancement”or “increase” of the level of an agent such as a protein, mRNA or RNAmeans that the level is increased relative to a substantially identicalmicroorganism grown under substantially identical conditions. As usedherein, “enhancement” or “increase” of the level of an agent, such asfor example a preRNA, mRNA, rRNA, tRNA, expressed by the target geneand/or of the protein product encoded by it, means that the level isincreased 50% or more, for example 100% or more, preferably 200% ormore, more preferably 5 fold or more, even more preferably 10 fold ormore, most preferably 20 fold or more for example 50 fold relative to asuitable reference microorganism. The enhancement or increase can bedetermined by methods with which the skilled worker is familiar. Thus,the enhancement or increase of the nucleic acid or protein quantity canbe determined for example by an immunological detection of the protein.Moreover, techniques such as protein assay, fluorescence, Northernhybridization, densitometric measurement of nucleic acid concentrationin a gel, nuclease protection assay, reverse transcription (quantitativeRT-PCR), ELISA (enzyme-linked immunosorbent assay), Western blotting,radioimmunoassay (RIA) or other immunoassays and fluorescence-activatedcell analysis (FACS) can be employed to measure a specific protein orRNA in a microorganism. Depending on the type of the induced proteinproduct, its activity or the effect on the phenotype of themicroorganism may also be determined. Methods for determining theprotein quantity are known to the skilled worker. Examples, which may bementioned, are: the microBiuret method (Goa J (1953) Scand J Clin LabInvest 5:218-222), the Folin-Ciocalteau method (Lowry O H et al. (1951)J Biol Chem 193:265-275) or measuring the absorption of CBB G-250(Bradford M M (1976) Analyt Biochem 72:248-254).

Expression: “Expression” refers to the biosynthesis of a gene product,preferably to the transcription and/or translation of a nucleotidesequence, for example an endogenous gene or a heterologous gene, in acell. For example, in the case of a structural gene, expression involvestranscription of the structural gene into mRNA and—optionally—thesubsequent translation of mRNA into one or more polypeptides. In othercases, expression may refer only to the transcription of the DNAharboring an RNA molecule.

Foreign: The term “foreign” refers to any nucleic acid molecule (e.g.,gene sequence) which is introduced into a cell by experimentalmanipulations and may include sequences found in that cell as long asthe introduced sequence contains some modification (e.g., a pointmutation, the presence of a selectable marker gene, etc.) and istherefore different relative to the naturally-occurring sequence.

Functional fragment: the term “functional fragment” refers to anynucleic acid and/or protein which comprises merely a part of the fulllength nucleic acid and/or full length polypeptide of the invention butstill provides the same function, i.e. the function of an AAT enzymecatalyzing the reaction of acryloyl-CoA and butanol to n-BA and CoA.Preferably, the fragment comprises at least 50%, at least 60%, at least70%, at least 80%, at least 90% at least 95%, at least 98%, at least 99%of the sequence from which it is derived. Preferably, the functionalfragment comprises contiguous nucleic acids or amino acids of thenucleic acid and/or protein from which the functional fragment isderived. A functional fragment of a nucleic acid molecule encoding aprotein means a fragment of the nucleic acid molecule encoding afunctional fragment of the protein.

Functional linkage: The term “functional linkage” or “functionallylinked” is equivalent to the term “operable linkage” or “operablylinked” and is to be understood as meaning, for example, the sequentialarrangement of a regulatory element (e.g. a promoter) with a nucleicacid sequence to be expressed and, if appropriate, further regulatoryelements (such as e.g., a terminator) in such a way that each of theregulatory elements can fulfill its intended function to allow, modify,facilitate or otherwise influence expression of said nucleic acidsequence. As a synonym the wording “operable linkage” or “operablylinked” may be used. The expression may result depending on thearrangement of the nucleic acid sequences in relation to sense orantisense RNA. To this end, direct linkage in the chemical sense is notnecessarily required. Genetic control sequences such as, for example,enhancer sequences, can also exert their function on the target sequencefrom positions which are further away, or indeed from other DNAmolecules. Preferred arrangements are those in which the nucleic acidsequence to be expressed recombinantly is positioned behind the sequenceacting as promoter, so that the two sequences are linked covalently toeach other. In a preferred embodiment, the nucleic acid sequence to betranscribed is located behind the promoter in such a way that thetranscription start is identical with the desired beginning of thechimeric RNA of the invention. Functional linkage, and an expressionconstruct, can be generated by means of customary recombination andcloning techniques as described (e.g., Sambrook J, Fritsch E F andManiatis T (1989); Silhavy et al. (1984) Experiments with Gene Fusions,Cold Spring Harbor Laboratory, Cold Spring Harbor (N.Y.); Ausubel et al.(1987) Current Protocols in Molecular Biology, Greene Publishing Assoc.and Wiley Interscience; Gelvin et al. (Eds) (1990) Plant MolecularBiology Manual; Kluwer Academic Publisher, Dordrecht, The Netherlands).However, further sequences, which, for example, act as a linker withspecific cleavage sites for restriction enzymes, or as a signal peptide,may also be positioned between the two sequences. The insertion ofsequences may also lead to the expression of fusion proteins.Preferably, the expression construct, consisting of a linkage of aregulatory region for example a promoter and nucleic acid sequence to beexpressed, can exist in a vector-integrated form or can be inserted intothe genome, for example by transformation.

Gene: The term “gene” refers to a region operably linked to appropriateregulatory sequences capable of regulating the expression of the geneproduct (e.g., a polypeptide or a functional RNA) in some manner. A geneincludes untranslated regulatory regions of DNA (e.g., promoters,enhancers, repressors, etc.) preceding (up-stream) and following(downstream) the coding region (open reading frame, ORF). The term“structural gene” as used herein is intended to mean a DNA sequence thatis transcribed into mRNA which is then translated into a sequence ofamino acids characteristic of a specific polypeptide.

Genome and genomic DNA: The terms “genome” or “genomic DNA” is referringto the heritable genetic information of a host organism. Said genomicDNA comprises the DNA of the nucleoid but also the DNA of theself-replicating plasmid.

Heterologous: The term “heterologous” with respect to a nucleic acidmolecule or DNA refers to a nucleic acid molecule which is operablylinked to, or is manipulated to become operably linked to, a secondnucleic acid molecule to which it is not operably linked in nature, orto which it is operably linked at a different location in nature. Aheterologous expression construct comprising a nucleic acid molecule andone or more regulatory nucleic acid molecule (such as a promoter or atranscription termination signal) linked thereto for example is aconstructs originating by experimental manipulations in which either a)said nucleic acid molecule, or b) said regulatory nucleic acid moleculeor c) both (i.e. (a) and (b)) is not located in its natural (native)genetic environment or has been modified by experimental manipulations,an example of a modification being a substitution, addition, deletion,inversion or insertion of one or more nucleotide residues. Naturalgenetic environment refers to the natural genomic locus in the organismof origin, or to the presence in a genomic library. In the case of agenomic library, the natural genetic environment of the sequence of thenucleic acid molecule is preferably retained, at least in part. Theenvironment flanks the nucleic acid sequence at least at one side andhas a sequence of at least 50 bp, preferably at least 500 bp, especiallypreferably at least 1,000 bp, very especially preferably at least 5,000bp, in length. A naturally occurring expression construct—for examplethe naturally occurring combination of a promoter with the correspondinggene—becomes a transgenic expression construct when it is modified bynon-natural, synthetic “artificial” methods such as, for example,mutagenization. Such methods have been described (U.S. Pat. No.5,565,350; WO 00/15815). For example a protein encoding nucleic acidmolecule operably linked to a promoter, which is not the native promoterof this molecule, is considered to be heterologous with respect to thepromoter. Preferably, heterologous DNA is not endogenous to or notnaturally associated with the cell into which it is introduced, but hasbeen obtained from another cell or has been synthesized. HeterologousDNA also includes an endogenous DNA sequence, which contains somemodification, non-naturally occurring, multiple copies of an endogenousDNA sequence, or a DNA sequence which is not naturally associated withanother DNA sequence physically linked thereto. Generally, although notnecessarily, heterologous DNA encodes RNA or proteins that are notnormally produced by the cell into which it is expressed.

Hybridization: The term “hybridization” as used herein includes “anyprocess by which a strand of nucleic acid molecule joins with acomplementary strand through base pairing.” (J. Coombs (1994) Dictionaryof Biotechnology, Stockton Press, New York). Hybridization and thestrength of hybridization (i.e., the strength of the association betweenthe nucleic acid molecules) is impacted by such factors as the degree ofcomplementarity between the nucleic acid molecules, stringency of theconditions involved, the Tm of the formed hybrid, and the G:C ratiowithin the nucleic acid molecules. As used herein, the term “Tm” is usedin reference to the “melting temperature.” The melting temperature isthe temperature at which a population of double-stranded nucleic acidmolecules becomes half dissociated into single strands. The equation forcalculating the Tm of nucleic acid molecules is well known in the art.As indicated by standard references, a simple estimate of the Tm valuemay be calculated by the equation: Tm=81.5+0.41(% G+C), when a nucleicacid molecule is in aqueous solution at 1 M NaCl [see e.g., Anderson andYoung, Quantitative Filter Hybridization, in Nucleic Acid Hybridization(1985)]. Other references include more sophisticated computations, whichtake structural as well as sequence characteristics into account for thecalculation of Tm. Stringent conditions, are known to those skilled inthe art and can be found in Current Protocols in Molecular Biology, JohnWiley & Sons, N.Y. (1989), 6.3.1-6.3.6.

Suitable hybridization conditions are for example hybridizing underconditions equivalent to hybridization in 7% sodium dodecyl sulfate(SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. with washing in 2×SSC, 0.1% SDSat 50° C. (low stringency) to a nucleic acid molecule comprising atleast 50, preferably at least 100, more preferably at least 150, evenmore preferably at least 200, most preferably at least 250 consecutivenucleotides of the complement of a sequence. Other suitable hybridizingconditions are hybridization in 7% sodium dodecyl sulfate (SDS), 0.5 MNaPO4, 1 mM EDTA at 50° C. with washing in 1×SSC, 0.1% SDS at 50° C.(medium stringency) or 65° C. (high stringency) to a nucleic acidmolecule comprising at least 50, preferably at least 100, morepreferably at least 150, even more preferably at least 200, mostpreferably at least 250 consecutive nucleotides of a complement of asequence. Other suitable hybridization conditions are hybridization in7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50° C. withwashing in 0.1×SSC, 0.1% SDS at 65° C. (very high stringency) to anucleic acid molecule comprising at least 50, preferably at least 100,more preferably at least 150, even more preferably at least 200, mostpreferably at least 250 consecutive nucleotides of a complement of asequence.

“Identity”: “Identity” when used in respect to the comparison of two ormore nucleic acid or amino acid molecules means that the sequences ofsaid molecules share a certain degree of sequence similarity, thesequences being partially identical.

For the determination of the percentage identity of two or more aminoacids or of two or more nucleotide sequences several computer softwareprograms have been developed. The identity of two or more sequences canbe calculated with for example the software fasta, which presently hasbeen used in the version fasta 3 (W. R. Pearson and D. J. Lipman, PNAS85, 2444(1988); W. R. Pearson, Methods in Enzymology 183, 63 (1990); W.R. Pearson and D. J. Lipman, PNAS 85, 2444 (1988); W. R. Pearson,Enzymology 183, 63 (1990)). Another useful program for the calculationof identities of different sequences is the standard blast program,which is included in the Biomax pedant software (Biomax, Munich, FederalRepublic of Germany). This leads unfortunately sometimes to suboptimalresults since blast does not always include complete sequences of thesubject and the query. Nevertheless as this program is very efficient itcan be used for the comparison of a huge number of sequences. Thefollowing settings are typically used for such a comparisons ofsequences:

-p Program Name [String]; -d Database [String]; default=nr; -i QueryFile [File In]; default=stdin; -e Expectation value (E) [Real];default=10.0; -m alignment view options: 0=pairwise; 1=query-anchoredshowing identities; 2=query-anchored no identities; 3=flatquery-anchored, show identities; 4=flat query-anchored, no identities;5=query-anchored no identities and blunt ends; 6=flat query-anchored, noidentities and blunt ends; 7=XML Blast output; 8=tabular; 9 tabular withcomment lines [Integer]; default=0; -o BLAST report Output File [FileOut] Optional; default=stdout; -F Filter query sequence (DUST withblastn, SEG with others) [String]; default=T; -G Cost to open a gap(zero invokes default behavior) [Integer]; default=0; -E Cost to extenda gap (zero invokes default behavior) [Integer]; default=0; -X X dropoffvalue for gapped alignment (in bits) (zero invokes default behavior);blastn 30, megablast 20, tblastx 0, all others 15 [Integer]; default=0;-1 Show GI's in defines [T/F]; default=F; -q Penalty for a nucleotidemismatch (blastn only) [Integer]; default=−3; -r Reward for a nucleotidematch (blastn only) [Integer]; default=1; -v Number of databasesequences to show one-line descriptions for (V) [Integer]; default=500;-b Number of database sequence to show alignments for (B) [Integer];default=250; -f Threshold for extending hits, default if zero; blastp11, blastn 0, blastx 12, tblastn 13; tblastx 13, megablast 0 [Integer];default=0; -g Perfom gapped alignment (not available with tblastx)[T/F]; default=T; -Q Query Genetic code to use [Integer]; default=1; -DDB Genetic code (for tblast[nx] only) [Integer]; default=1; -a Number ofprocessors to use [Integer]; default=1; -O SeqAlign file [File Out]Optional; -J Believe the query defline [T/F]; default=F; -M Matrix[String]; default=BLOSUM62; -W Word size, default if zero (blastn 11,megablast 28, all others 3) [Integer]; default=0; -z Effective length ofthe database (use zero for the real size) [Real]; default=0; -K Numberof best hits from a region to keep (off by default, if used a value of100 is recommended) [Integer]; default=0; -P 0 for multiple hit, 1 forsingle hit [Integer]; default=0; -Y Effective length of the search space(use zero for the real size) [Real]; default=0; -S Query strands tosearch against database (for blast[nx], and tblastx); 3 is both, 1 istop, 2 is bottom [Integer]; default=3; -T Produce HTML output [T/F];default=F; -1 Restrict search of database to list of GI's [String]Optional; -U Use lower case filtering of FASTA sequence [T/F] Optional;default=F; -y X dropoff value for ungapped extensions in bits (0.0invokes default behavior); blastn 20, megablast 10, all others 7 [Real];default=0.0; -Z X dropoff value for final gapped alignment in bits (0.0invokes default behavior); blastn/megablast 50, tblastx 0, all others 25[Integer]; default=0; -R PSI-TBLASTN checkpoint file [File In] Optional;-n MegaBlast search [T/F]; default=F; -L Location on query sequence[String] Optional; -A Multiple Hits window size, default if zero(blastn/megablast 0, all others 40 [Integer]; default=0; -w Frame shiftpenalty (OOF algorithm for blastx) [Integer]; default=0; -t Length ofthe largest intron allowed in tblastn for linking HSPs (0 disableslinking) [Integer]; default=0.

Results of high quality are reached by using the algorithm of Needlemanand Wunsch or Smith and Waterman. Therefore programs based on saidalgorithms are preferred. Advantageously the comparisons of sequencescan be done with the program PileUp (J. Mol. Evolution, 25, 351 (1987),Higgins et al., CABIOS 5, 151 (1989)) or preferably with the programs“Gap” and “Needle”, which are both based on the algorithms of Needlemanand Wunsch (J. Mol. Biol. 48; 443 (1970)), and “BestFit”, which is basedon the algorithm of Smith and Waterman (Adv. Appl. Math. 2; 482 (1981)).“Gap” and “BestFit” are part of the GCG software-package (GeneticsComputer Group, 575 Science Drive, Madison, Wis., USA 53711 (1991);Altschul et al., (Nucleic Acids Res. 25, 3389 (1997)), “Needle” is partof the The European Molecular Biology Open Software Suite (EMBOSS)(Trends in Genetics 16 (6), 276 (2000)). Therefore preferably thecalculations to determine the percentages of sequence identity are donewith the programs “Gap” or “Needle” over the whole range of thesequences. The following standard adjustments for the comparison ofnucleic acid sequences were used for “Needle”: matrix: EDNAFULL,Gap_penalty: 10.0, Extend_penalty: 0.5. The following standardadjustments for the comparison of nucleic acid sequences were used for“Gap”: gap weight: 50, length weight: 3, average match: 10.000, averagemismatch: 0.000.

For example a sequence, which is said to have 80% identity with sequenceSEQ ID NO: 1 at the nucleic acid level is understood as meaning asequence which, upon comparison with the sequence represented by SEQ IDNO: 1 by the above program “Needle” with the above parameter set, has a80% identity. Preferably the identity is calculated on the completelength of the query sequence, for example SEQ ID NO: 1.

Isolated: The term “isolated” as used herein means that a material hasbeen removed by the hand of man and exists apart from its original,native environment and is therefore not a product of nature. An isolatedmaterial or molecule (such as a DNA molecule or enzyme) may exist in apurified form or may exist in a non-native environment such as, forexample, in a transgenic host cell. For example, a naturally occurringnucleic acid molecule or polypeptide present in a living cell is notisolated, but the same nucleic acid molecule or polypeptide, separatedfrom some or all of the coexisting materials in the natural system, isisolated. Such nucleic acid molecules can be part of a vector and/orsuch nucleic acid molecules or polypeptides could be part of acomposition, and would be isolated in that such a vector or compositionis not part of its original environment. Preferably, the term “isolated”when used in relation to a nucleic acid molecule, as in “an isolatednucleic acid sequence” refers to a nucleic acid sequence that isidentified and separated from at least one contaminant nucleic acidmolecule with which it is ordinarily associated in its natural source.Isolated nucleic acid molecule is nucleic acid molecule present in aform or setting that is different from that in which it is found innature. In contrast, non-isolated nucleic acid molecules are nucleicacid molecules such as DNA and RNA, which are found in the state theyexist in nature. For example, a given DNA sequence (e.g., a gene) isfound on the host cell chromosome in proximity to neighboring genes; RNAsequences, such as a specific mRNA sequence encoding a specific protein,are found in the cell as a mixture with numerous other mRNAs, whichencode a multitude of proteins. However, an isolated nucleic acidsequence comprising for example SEQ ID NO: 1 includes, by way ofexample, such nucleic acid sequences in cells which ordinarily containSEQ ID NO: 1 where the nucleic acid sequence is in a genomic or plasmidlocation different from that of natural cells, or is otherwise flankedby a different nucleic acid sequence than that found in nature. Theisolated nucleic acid sequence may be present in single- ordouble-stranded form. When an isolated nucleic acid sequence is to beutilized to express a protein, the nucleic acid sequence will contain ata minimum at least a portion of the sense or coding strand (i.e., thenucleic acid sequence may be single-stranded). Alternatively, it maycontain both the sense and anti-sense strands (i.e., the nucleic acidsequence may be double-stranded).

Non-coding: The term “non-coding” refers to sequences of nucleic acidmolecules that do not encode part or all of an expressed protein.Non-coding sequences include but are not limited enhancers, promoterregions, 3′ untranslated regions, and 5′ untranslated regions.

Nucleic acids and nucleotides: The terms “nucleic acids” and“Nucleotides” refer to naturally occurring or synthetic or artificialnucleic acid or nucleotides. The terms “nucleic acids” and “nucleotides”comprise deoxyribonucleotides or ribonucleotides or any nucleotideanalogue and polymers or hybrids thereof in either single- ordoublestranded, sense or antisense form. Unless otherwise indicated, aparticular nucleic acid sequence also implicitly encompassesconservatively modified variants thereof (e.g., degenerate codonsubstitutions) and complementary sequences, as well as the sequenceexplicitly indicated. The term “nucleic acid” is used inter-changeablyherein with “gene”, “cDNA, “mRNA”, “oligonucleotide,” and “nucleic acidmolecule”. Nucleotide analogues include nucleotides having modificationsin the chemical structure of the base, sugar and/or phosphate,including, but not limited to, 5-position pyrimidine modifications,8-position purine modifications, modifications at cytosine exocyclicamines, substitution of 5-bromo-uracil, and the like; and 2′-positionsugar modifications, including but not limited to, sugar-modifiedribonucleotides in which the 2′-OH is replaced by a group selected fromH, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN. Short hairpin RNAs(shRNAs) also can comprise non-natural elements such as non-naturalbases, e.g., ionosin and xanthine, non-natural sugars, e.g., 2′-methoxyribose, or non-natural phosphodiester linkages, e.g.,methylphosphonates, phosphorothioates and peptides.

Nucleic acid sequence: The phrase “nucleic acid sequence” refers to asingle- or double-stranded polymer of deoxyribonucleotide orribonucleotide bases read from the 5′- to the 3′-end. It includeschromosomal DNA, self-replicating plasmids, infectious polymers of DNAor RNA and DNA or RNA that performs a primarily structural role.“Nucleic acid sequence” also refers to a consecutive list ofabbreviations, letters, characters or words, which representnucleotides. In one embodiment, a nucleic acid can be a “probe” which isa relatively short nucleic acid, usually less than 100 nucleotides inlength. Often a nucleic acid probe is from about 50 nucleotides inlength to about 10 nucleotides in length. A “target region” of a nucleicacid is a portion of a nucleic acid that is identified to be ofinterest. A “coding region” of a nucleic acid is the portion of thenucleic acid, which is transcribed and translated in a sequence-specificmanner to produce into a particular polypeptide or protein when placedunder the control of appropriate regulatory sequences. The coding regionis said to encode such a polypeptide or protein.

Oligonucleotide: The term “oligonucleotide” refers to an oligomer orpolymer of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) ormimetics thereof, as well as oligonucleotides havingnon-naturally-occurring portions which function similarly. Such modifiedor substituted oligonucleotides are often preferred over native formsbecause of desirable properties such as, for example, enhanced cellularuptake, enhanced affinity for nucleic acid target and increasedstability in the presence of nucleases. An oligonucleotide preferablyincludes two or more nucleomonomers covalently coupled to each other bylinkages (e.g., phosphodiesters) or substitute linkages.

Overhang: An “overhang” is a relatively short single-stranded nucleotidesequence on the 5′- or 3′-hydroxyl end of a double-strandedoligonucleotide molecule (also referred to as an “extension,”“protruding end,” or “sticky end”).

Polypeptide: The terms “polypeptide”, “peptide”, “oligopeptide”,“polypeptide”, “gene product”, “expression product” and “protein” areused interchangeably herein to refer to a polymer or oligomer ofconsecutive amino acid residues.

Promoter: The terms “promoter”, or “promoter sequence” are equivalentsand as used herein, refer to a DNA sequence which when operably linkedto a nucleotide sequence of interest is capable of controlling thetranscription of the nucleotide sequence of interest into RNA. Apromoter is located 5′ (i.e., upstream), proximal to the transcriptionalstart site of a nucleotide sequence of interest whose transcription intomRNA it controls, and provides a site for specific binding by RNApolymerase and other transcription factors for initiation oftranscription. The promoter does not comprise coding regions oruntranslated regions. The promoter may for example be heterologous orhomologous to the respective cell. A nucleic acid molecule sequence is“heterologous to” an organism or a second nucleic acid molecule sequenceif it originates from a foreign species, or, if from the same species,is modified from its original form. For example, a promoter operablylinked to a heterologous coding sequence refers to a coding sequencefrom a species different from that from which the promoter was derived,or, if from the same species, a coding sequence which is not naturallyassociated with the promoter (e.g. a genetically engineered codingsequence or an allele from a different ecotype or variety). Suitablepromoters can be derived from genes of the host cells where expressionshould occur or from pathogens for this host.

Purified: As used herein, the term “purified” refers to molecules,either nucleic or amino acid sequences that are removed from theirnatural environment, isolated or separated. “Substantially purified”molecules are at least 60% free, preferably at least 75% free, and morepreferably at least 90% free from other components with which they arenaturally associated. A purified nucleic acid sequence may be anisolated nucleic acid sequence.

Significant increase: An increase for example in enzymatic activity,gene expression, productivity or yield of a certain product, that islarger than the margin of error inherent in the measurement technique,preferably an increase by about 10% or 25% preferably by 50% or 75%,more preferably 2-fold or-5 fold or greater of the activity, expression,productivity or yield of the control enzyme or expression in the controlcell, productivity or yield of the control cell, even more preferably anincrease by about 10-fold or greater.

Significant decrease: A decrease for example in enzymatic activity, geneexpression, productivity or yield of a certain product, that is largerthan the margin of error inherent in the measurement technique,preferably a decrease by at least about 5% or 10%, preferably by atleast about 20% or 25%, more preferably by at least about 50% or 75%,even more preferably by at least about 80% or 85%, most preferably by atleast about 90%, 95%, 97%, 98% or 99%.

Substantially complementary: In its broadest sense, the term“substantially complementary”, when used herein with respect to anucleotide sequence in relation to a reference or target nucleotidesequence, means a nucleotide sequence having a percentage of identitybetween the substantially complementary nucleotide sequence and theexact complementary sequence of said reference or target nucleotidesequence of at least 60%, more desirably at least 70%, more desirably atleast 80% or 85%, preferably at least 90%, more preferably at least 93%,still more preferably at least 95% or 96%, yet still more preferably atleast 97% or 98%, yet still more preferably at least 99% or mostpreferably 100% (the later being equivalent to the term “identical” inthis context). Preferably identity is assessed over a length of at least19 nucleotides, preferably at least 50 nucleotides, more preferably theentire length of the nucleic acid sequence to said reference sequence(if not specified otherwise below). Sequence comparisons are carried outusing default GAP analysis with the University of Wisconsin GCG, SEQWEBapplication of GAP, based on the algorithm of Needleman and Wunsch(Needleman and Wunsch (1970) J Mol. Biol. 48: 443-453; as definedabove). A nucleotide sequence “substantially complementary” to areference nucleotide sequence hybridizes to the reference nucleotidesequence under low stringency conditions, preferably medium stringencyconditions, most preferably high stringency conditions (as definedabove).

Transgene: The term “transgene” as used herein refers to any nucleicacid sequence, which is introduced into the genome of a cell byexperimental manipulations. A transgene may be an “endogenous DNAsequence,” or a “heterologous DNA sequence” (i.e., “foreign DNA”). Theterm “endogenous DNA sequence” refers to a nucleotide sequence, which isnaturally found in the cell into which it is introduced so long as itdoes not contain some modification (e.g., a point mutation, the presenceof a selectable marker gene, etc.) relative to the naturally-occurringsequence.

Transgenic: The term transgenic when referring to an organism meanstransformed, preferably stably transformed, with at least onerecombinant nucleic acid molecule.

Vector: As used herein, the term “vector” refers to a nucleic acidmolecule capable of transporting another nucleic acid molecule to whichit has been linked. One type of vector is a genomic integrated vector,or “integrated vector”, which can become integrated into the genomic DNAof the host cell. Another type of vector is an episomal vector, i.e., aplasmid or a nucleic acid molecule capable of extra-chromosomalreplication. Vectors capable of directing the expression of genes towhich they are operatively linked are referred to herein as “expressionvectors”. In the present specification, “plasmid” and “vector” are usedinterchangeably unless otherwise clear from the context.

Wild type: The term “wild type”, “natural” or “natural origin” meanswith respect to an organism that said organism is not changed, mutated,or otherwise manipulated by man. With respect to a polypeptide ornucleic acid sequence, that the polypeptide or nucleic acid sequence isnaturally occurring or available in at least one naturally occurringorganism which is not changed, mutated, or otherwise manipulated by man.

A wild type of a microorganism refers to a microorganism whose genome ispresent in a state as before the introduction of a genetic modificationof a certain gene. The genetic modification may be e.g. a deletion of agene or a part thereof or a point mutation or the introduction of agene.

The terms “production” or “productivity” are art-recognized and includethe concentration of the fermentation product (for example, dsRNA)formed within a given time and a given fermentation volume (e.g., kgproduct per hour per liter). The term “efficiency of production”includes the time required for a particular level of production to beachieved (for example, how long it takes for the cell to attain aparticular rate of output of a fine chemical).

The term “yield” or “product/carbon yield” is art-recognized andincludes the efficiency of the conversion of the carbon source into theproduct (i.e., fine chemical). This is generally written as, forexample, kg product per kg carbon source. By increasing the yield orproduction of the compound, the quantity of recovered molecules or ofuseful recovered molecules of that compound in a given amount of cultureover a given amount of time is increased.

The term “recombinant microorganism” includes microorganisms which havebeen genetically modified such that they exhibit an altered or differentgenotype and/or phenotype (e. g., when the genetic modification affectscoding nucleic acid sequences of the microorganism) as compared to thewild type microorganism from which it was derived. A recombinantmicroorganism comprises at least one recombinant nucleic acid molecule.

The term “recombinant” with respect to nucleic acid molecules refers tonucleic acid molecules produced by man using recombinant nucleic acidtechniques. The term comprises nucleic acid molecules which as such donot exist in nature or do not exist in the organism from which thenucleic acid molecule is derived, but are modified, changed, mutated orotherwise manipulated by man. Preferably, a “recombinant nucleic acidmolecule” is a non-naturally occurring nucleic acid molecule thatdiffers in sequence from a naturally occurring nucleic acid molecule byat least one nucleic acid. A “recombinant nucleic acid molecules” mayalso comprise a “recombinant construct” which comprises, preferablyoperably linked, a sequence of nucleic acid molecules not naturallyoccurring in that order. Preferred methods for producing saidrecombinant nucleic acid molecules may comprise cloning techniques,directed or non-directed mutagenesis, gene synthesis or recombinationtechniques.

An example of such a recombinant nucleic acid molecule is a plasmid intowhich a heterologous DNA-sequence has been inserted or a gene orpromoter which has been mutated compared to the gene or promoter fromwhich the recombinant nucleic acid molecule derived. The mutation may beintroduced by means of directed mutagenesis technologies known in theart or by random mutagenesis technologies such as chemical, UV light orx-ray mutagenesis or directed evolution technologies.

The term “directed evolution” is used synonymously with the term“metabolic evolution” herein and involves applying a selection pressurethat favors the growth of mutants with the traits of interest. Theselection pressure can be based on different culture conditions, ATP andgrowth coupled selection and redox related selection. The selectionpressure can be carried out with batch fermentation with serialtransferring inoculation or continuous culture with the same pressure.

The term “expression” or “gene expression” means the transcription of aspecific gene(s) or specific genetic vector construct. The term“expression” or “gene expression” in particular means the transcriptionof gene(s) or genetic vector construct into mRNA. The process includestranscription of DNA and may include processing of the resultingRNA-product. The term “expression” or “gene expression” may also includethe translation of the mRNA and therewith the synthesis of the encodedprotein, i.e. protein expression.

FIGURES

FIGS. 1 to 12 depict preferred structures of the fusion nucleic acidmolecules of the invention. A site directed nucleic acid modifyingpolypeptide is directed to the target sequence within the targetdouble-stranded nucleic acid by a guide NA fused to a donor NA (whichtogether form the fuNA molecule).

FIG. 1

Fusion NA molecule comprising from 5′ to 3′: the guide NA (spacerfollowed by scaffold), homology arm 1 and 2 optionally separated by anadditional nucleic acid region.

FIG. 2

Fusion NA molecule comprising from 5′ to 3′: homology arm 2 and 1optionally separated by an additional nucleic acid region, and the guideNA (scaffold followed by spacer)

FIG. 3

Fusion NA molecule comprising from 5′ to 3′: the guide NA (spacerfollowed by scaffold), homology arm 2 and 1 optionally separated by anadditional nucleic acid region.

FIG. 4

Fusion NA molecule comprising from 5′ to 3′: homology arm 1 and 2optionally separated by an additional nucleic acid region, and the guideNA (scaffold followed by spacer).

FIG. 5

Fusion NA molecule comprising from 5′ to 3′: homology arm 1 and 2optionally separated by an additional nucleic acid region, and the guideNA (spacer followed by scaffold).

FIG. 6

Fusion NA molecule comprising from 5′ to 3′: the guide NA (scaffoldfollowed by spacer), homology arm 1 and 2 optionally separated by anadditional nucleic acid region.

FIG. 7

Fusion NA molecule comprising from 5′ to 3′: homology arm 2 and 1optionally separated by an additional nucleic acid region, and the guideNA (spacer followed by scaffold).

FIG. 8

Fusion NA molecule comprising from 5′ to 3′: the guide NA (scaffoldfollowed by spacer), homology arm 2 and 1 optionally separated by anadditional nucleic acid region.

FIG. 9

Fusion NA molecules comprising from 5′ to 3′: guide NA (comprisingspacer and first molecule of the scaffold), homology arm 1 and 2optionally separated by an additional nucleic acid region. The secondmolecule of the scaffold is hybridizing to the first molecule of thescaffold.

FIG. 10

Fusion NA molecules comprising from 5′ to 3′: homology arm 1 and 2optionally separated by an additional nucleic acid region, guide NA(comprising first molecule of the scaffold and spacer). The secondmolecule of the scaffold is hybridizing to the first molecule of thescaffold.

FIG. 11

Fusion NA molecules comprising from 5′ to 3′: guide NA (comprisingspacer and first molecule of the scaffold), homology arm 2 and 1optionally separated by an additional nucleic acid region. The secondmolecule of the scaffold is hybridizing to the first molecule of thescaffold.

FIG. 12

Fusion NA molecules comprising from 5′ to 3′: homology arm 2 and 1optionally separated by an additional nucleic acid region, guide NA(comprising first molecule of the scaffold, spacer, and second moleculeof the scaffold hybridizing to the first molecule of the scaffold).

FIG. 13

Vector RWL121.

FIG. 14

Vector Cas003.

FIG. 15

Vector Cas018.

FIG. 16

Vector Cas006.

FIG. 17

Vector RWL137.

FIG. 18

Vector Cas019.

FIG. 19

Vector RLW138.

FIG. 20

Vector Cas020.

FIG. 21

Vector RLW139.

FIG. 22

shows the amylase amyE locus of B. subtilis ATCC6051 strain (A) with thelocation of the homology regions HomA and HomB as indicated. Thelocation of the protospacer sequence PS within in the amyE gene isindicated (A) and the sequence of the PS highlighted (black with whiteletters, B)(top strand is SEQ ID NO: 74; bottom strand is SEQ ID NO:75).

FIG. 23

shows the vector map of the pCC004 plasmid—the derivative of thepJOE8999 plasmid carrying the amyE protospacer and the homology regionsHomA and HomB of the region adjacent to the amyE gene. PS=protospacer;PvanP*=hemisynthetic promoter; 'gRNA=guideRNA; lambda TO terminator;PmanP=promoter of the manP gene B. subtilis; Cas9=endonuclease from S.pyrogenes; KanR=kanamycin resistance gene; origin of pUC for replicationin E. coli, origin of pE194 for replication in Bacillus.

FIG. 24

shows the schematic drawing of the EcoRI/XbaI fragment of the variousplasmids used in this study (as exemplified in FIG. 2 with plasmidpCC004). The Cas9 endonuclease, the PmanP promoter and the vectorbackbone with pUC replication origin, pE194 replication origin,kanamycin resistance gene are not shown. The promoter (Pro) driving thetranscription of downstream genetic elements is indicated.PS=Protospacer; gRNA=guide RNA consisting of crRNA-loop-tracrRNA,T=lambda TO terminator; homology region A and homology region B aredepicted as arrows indicative of the orientation of the amyE gene.Detailed description of the plasmid genetic elements by J. Altenbuchner(Altenbuchner J. 2016. Editing of the Bacillus subtilis genome by theCRISPR-Cas9 system. Appl Environ Microbiol 82:5421-5).

FIG. 25

The gene knockout efficiency as exemplified for the amylase gene foreach gene deletion construct (pJOE8999, pCC005-pCC008) relative topCC004 is plotted against the deletion constructs as indicated.

FIG. 26

shows 0.8% agarose gels of PCR reactions with oligonucleotides Seq IDNO: 60 and 61 on genomic DNA of 13 individual clones from gene deletionreactions with indicated plasmids pCC004, pCC005, pCC006, pCC007,pCC008. The amplification of a DNA fragment of 1.4 kb indicates geneknockout by recombination whereas a DNA fragment of 3.4 kb indicatesamylase gene inactivation by rather a SOS repair mechanism. The 3.4 kbband for WT indicates wildtype amylase locus of B. subtilis WT ATCC6051.C denotes water control with no genomic DNA added. M indicates DNAladder ‘Perfect plus 1 kb DNA ladder’ (roboklon) with the size of threebands indicated (1.0 kb, 1.5 kb. 4.0 kb).

EXAMPLES Chemicals and Common Methods

Unless indicated otherwise, cloning procedures carried out for thepurposes of the present invention including restriction digest, agarosegel electrophoresis, purification of nucleic acids, ligation of nucleicacids, transformation, selection and cultivation of bacterial cells areperformed as described (Sambrook J, Fritsch E F and Maniatis T (1989)).Sequence analyses of recombinant DNA are performed with a laserfluorescence DNA sequencer (Applied Biosystems, Foster City, Calif.,USA) using the Sanger technology (Sanger et al., 1977). Unless describedotherwise, chemicals and reagents are obtained from Sigma Aldrich (SigmaAldrich, St. Louis, USA), from Promega (Madison, Wis., USA), Duchefa(Haarlem, The Netherlands) or Invitrogen (Carlsbad, Calif., USA).Restriction endonucleases are from New England Biolabs (Ipswich, Mass.,USA) or Roche Diagnostics GmbH (Penzberg, Germany). Oligonucleotides aresynthesized by Eurofins MWG Operon (Ebersberg, Germany).

Introduction to Experimental Procedures

A yeast codon-optimized version of the Cas9 protein bearing a C-terminusSV40 nuclear localization signal (SEQ ID NO: 1) was synthetized andcloned into a yeast expression vector. The same vector included one ormore guide RNAs (gRNAs) expressed from the Saccharomyces cerevisiaeSNR52 polymerase III promoter.

Cas9 binds DNA and cleaves both strands upon recognition of a targetsequence by the gRNA, but only if the correct protospacer-adjacent motif(PAM) is present at the 3′ end. Theoretically, any sequence of the formGN20GG can be targeted. So, a second vector was constructed forco-expression in yeast of a reporter system (GAL4-UAS (SEQ ID NO: 7)) tobe targeted by the designed CRISPR system. gRNA-donor fusions (fusionNA) were used to target and repair several non-functional Gal4 targets(SEQ ID NOs: 9-15). Gal4 (SEQ ID NO: 8) is a yeast transcriptionalactivator consisting of two-components: the DNA binding domain locatedN-terminus and the region for transcriptional activation at C-terminus.Gal4 binds to the specific recognition sequence UAS (upstream activatingsequence) of marker genes in the yeast genome, activating theirtranscription. The MaV203 yeast strain contains single copies of each ofthree reporter genes (HIS3, URA3 and lacZ) that are stably integrated atdifferent loci in the yeast genome. The promoter regions of URA3, HIS3,and lacZ are unrelated (except for the presence of GAL4 binding sites).

Several non-functional (deleted and/or disrupted by insertion of a STOPcodon) versions of Gal4 were synthesized (SEQ ID NOs: 9-15) andtransformed into yeast cells, so that they could be targeted andrepaired by the co-expressed CRISPR machinery. Restoration of thefull-length Gal4 by homologous recombination (HR) with the appropriaterepair donor sequence provided with the CRISPR components results inactivation of lacZ and HIS3 reporter genes. Gal4 gene repair andconsequent transcription activation can be monitored by cell growth onplates lacking histidine, whereas induction of the lacZ gene results ina blue color when assayed with X-gal(5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside).

The employed yeast strain contains two additional auxotrophic mutations(leu2 and trp1) to allow selection for both expression constructs.

To verify repair efficacy increase of the fusion system disclosed here,all experiments were performed in parallel with non-fused cassettes, inwhich donor and guide RNA are transcribed separately.

Yeast Strain, Media and Cultivation Conditions

The Saccharomyces cerevisiae strain used in the examples described isMaV203 (MATα, leu2-3,112, trp1-901, his3Δ200, ade2-101, gal4Δ, gal80Δ,SPAL10::URA3, GAL1::lacZ, HIS3UAS GAL1::HIS3@LYS2, can1R, cyh2R),commercialized by Life Technologies. Yeast was grown in SyntheticMinimal Media (SD Media) based upon Yeast Nitrogen Base supplementedwith 2% glucose and lacking the appropriate auxotrophic compounds(ForMedium, United Kingdom). Cultures were grown at 30° C., either in ashaker or incubation oven.

Escherichia coli was used as propagation microorganism for all theplasmids used in our experiments, as well as for further propagation andmaintenance of the modified targets. E. coli was grown accordingstandard microbiological practices (Molecular Cloning: A LaboratoryManual, 3rd ed., Vols 1, 2 and 3. J. F. Sambrook and D. W. Russell, ed.,Cold Spring Harbor Laboratory Press, 2001). Plasmids containing theCas9, guide RNA and donor NA included a pUC-based replication origin andampicillin resistance gene for replication and maintenance in E. coli.Whereas GAL4 target plasmids contained a gentamicin resistance gene(Gmr).

Example 1 Plasmid Construction

The Cas9 gene was a yeast codon-optimized version of the Streptococcuspyogenes Cas9 (SpCas9; WO2007/025097) originally constructed forexpression in eukaryotic cells (Mali et al (2013) Science 339(6121);Cong et al (2013) Science 339(6121)). This Cas9 gene was tagged with aSV40 nuclear localization signal at both ends and synthesised. Also, thegRNA and donor expression cassette containing the SNR52 promoter for invivo RNA synthesis were synthesised.

The GAL4-AD coding sequence in pDEST22 (Life Technologies) was replacedby the synthetic Cas9 via Seamless Cloning (Life Technologies). Thisvector contains the constitutive moderate-strength promoter andtranscription terminator of the yeast Alcohol Dehydrogenase gene (ADH1)for expression in yeast as well as a TRP1 gene for selection in yeast onmedium lacking tryptophan.

The same vector contains two recombination sites, attR1 and attR2,flanking a chloramphenicol resistance gene (Cmr) and a ccdB gene,allowing the designed gRNA and donor expression cassettes (as fusion ordual molecule) to be introduced in the same expression vector viaGateway Cloning (Life Technologies). Following the LR recombinationreaction, the Cmr and ccdB genes were replaced by the fusion NA cassetteor nonfused donor and guide expression cassettes.

Modified GAL4 coding sequences used as targets for CRISPR repair inyeast were synthesized. The pDEST32 plasmid for expression in yeast(Life Technologies) was cut with HindIII and SacII and the backbone,containing the ADH1 promoter and terminator, was gel purified. The GAL4synthesized inserts were assembled into the vector using SeamlessCloning. This vector included a LEU2 gene for selection in yeast onmedium lacking tryptophan.

Target-sites for recognition by Cas9 in the GAL4 sequence wereempirically selected by choosing 20-mer regions preceding potential PAM(NGG) sequences within the GAL4 gene (Sternberg et al (2014); Nature507(7490)).

To facilitate Cas9 binding and R-loop formation, we chose a single guideRNA design with the secondary structure containing a dangling spacer, anextended hairpin region and a long 3′ end, as initially designed byJinek et al (2012) Science; 337(6096)).

Example 2 Yeast Transformation

Simultaneous transformation of the CRISPR editing tools (Cas9 enzyme andfusion NA expressing cassette) and GAL4 target plasmid was performed byheat-shock as described in the manufacture's protocol (LifeTechnologies) and propagated in the appropriate synthetic complete (SC)media lacking the auxotrophic compounds complemented by the plasmidsbeing introduced (leucine and tryptophan). The transformed cells wereallowed to propagate overnight and equal amounts of transformants(according to OD measurement) were transferred to solid platescontaining synthetic complete (SC) media lacking histidine with 100 mM3-Amino-1,2,4-triazole (3-AT; ForMedium, UK). Expression of HIS3 (forallowing yeast grow in medium without histidine) is GAL4-dependent andtherefore transformants are only able to grow if GAL4 repair hadoccurred. More above, 3-AT is a competitive inhibitor of the product ofthe HIS3 gene, by applying 3-AT to the yeast transformants which aredependent upon HIS3 to produce histidine, an increased level of HIS3expression is required in order for the yeast cells to survive.

Additionally the yeast strain used contained a lacZ marker gene underthe control of GAL4, which allowed for blue/white selection ofGAL4-repaired transformants. Induction of the lacZ gene results in ablue color when assayed with X-gal(5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside).

Example 3 X-gal Assay

Transformants growing in plates lacking histidine were replica platedonto a nitrocellulose membrane (Hybond, GE Healthcare) placed on thesurface of a plate with YPAD medium (Complex yeast media containing ahomogeneous blend of Peptone, Yeast Extract and Glucose; ForMedium, UK).Assay was performed after 18-24 h incubation of the YPAD platescontaining a membrane. For each membrane, 5 mg X-gal were dissolved in50 μl DMF and combined with 30 μl 2-mercaptoethanol and 5 ml Z buffer.This solution was used to saturate two round filter papers (Whatman 541)in a 15-cm petri dish. Using forceps, the membrane was carefully removefrom the surface of the YPAD plate and completely immersed in liquidnitrogen for about 20 seconds. The frozen membrane was placed on top ofthe soaked Whatman filters (colony side up). The plates were tightlycovered and incubate at 37° C. Appearance of blue color was monitoredafter 24 hours.

Example 4 Sequencing of Target (CRISPR Repaired) Plasmids

Four each experiment, at least eight GAL4-repaired positivetransformants (colonies able to grow in medium without histidine) weresub-cultured overnight in liquid medium and the GAL4 containing plasmidwas isolated (using Zymoprep Yeast Plasmid Miniprep II, Zymo Research).The isolated plasmids were introduced in E. coli for further propagationand commercial sequencing. GAL4 sequencing allowed verification of thesequence repair and assembly with the donor molecules.

Sanger sequencing of Gal4 gene in the positive clones further validatedthe sequence specificity of this targeting process, and showed nodifference in repair of cells expressing the donor and gRNA as fusion ornon-fused, even if cells transformed with fusion NA exhibit a muchhigher number of successful HR events.

Example 5 Deleting 1 nt with 15 bp Homology Arms

Fusion of donor (donor 1; SEQ ID NO: 26) to the guide RNA resulted inrepaired transformants (able to grow on medium lacking histidine),whereas no growth was observed for transformants with non-fused guideand donor RNA. The low efficiency of gene repair is consistent with thereduced sequence overlap available for homologous recombination.

Example 6 Deleting 1 nt with 50 bp Homology Arms

Fusion of donor (donor 2; SEQ ID NO: 27) to the guide RNA resulted in atleast 50 times more transformants than with non-fused donor and guideNA.

Sequencing of the Gal4 gene in the positive clones showed that repairresults only or very largely from HR (no evidence of NHEJ for allsequenced clones).

Example 7 Inserting 20 nt with 50/26 bp Homology Arms

The same fusion NA as above was used to repair a similar target where 20nt were removed (target 3; SEQ ID NO: 11), and as a consequence onehomology arm was reduced. Fusion resulted in about 5 times moretransformants than with non-fused donor and guide NA. Sequencing of theGal4 gene in the positive clones showed that repair results exclusivelyfrom homologous recombination.

Example 8 Inserting Missing 400 bp with 50 bp Homology Arms (whileTesting Two Target Sequences 3 nt Apart)

We tested for simultaneous targeting of two sequences (spacer 2 andspacer 3; SEQ ID NOs: 20 and 21) located in close proximity (3 nt gapbetween the two 20 nt target), both independently and together(multiplexed targeting). The multiplex fusion cassette consisted ofpromoter followed by two tandem fusion NA sequences, resulting inproduction of a single molecule composed of two gRNAs and repairtemplates. Our experiments clearly showed that fusion NA is alsoamenable for targeting two sequences simultaneously.

For both targets repair in the presence of the donor-guide fusion waslargely more efficient than with non-fused version (up to 10 times morefor targeting with space 2 and five times for spacer 3).

Example 9 Inserting Full GAL4 Gene (960 bp) Except HR Ends with 120 bpHomology Arms

In order to test if fusion CRISPR could be effective for introductionfull length coding sequences, we have tested introducing the full lengthGAL4 gene (SEQ ID NO: 7). As example, we have selected for 120 bphomology arms as to keep the ratio of donor/homology arm length alreadyfound to be effective in example 4. Insertion of full-length GAL4 geneis about four times more effective with Fusion construct.

Our results show that targeted editing is at least 50 times moreefficient when the repair donor sequence was fused to the gRNA. Theexperiments performed indicate a broad Fusion-related improvedeffectiveness from a single base removal up to full gene insertion. Theexamples reported show that this CRISPR fusion system is suitable tocarry relatively large Donor molecules fused to the guide RNA.

Example 10a Constructs for Expression in Rice

To accommodate the CRISPR/Cas system to Agrobacterium-mediated planttransformation, Gateway binary T-DNA vectors have been designed forco-expression of Cas9 nuclease and guide RNA-donor expression cassette(either as single or dual RNA molecules). A version of the Streptococcuspyogenes Cas9 (SpCas9) codon-optimized for expression in rice (Oryzasativa), attached to SV40 nuclear localization signals (NLS) at bothends (Seq ID NO: 6), was synthesized The synthesized cassette includesthe maize polyubiquitin (Ubi) promoter (Seq ID NO: 32) for constitutiveexpression located upstream the Cas9, and the nopaline synthase (nos)terminator (Seq ID NO: 33) at the 3′-end. This gene cassette has beencloned via Seamless into a vector, which contained as functionalelements within the T-DNA borders: a plant selectable marker; ascreenable marker expression cassette; and a Gateway cassette intendedfor LR recombination with the gRNA-donor expression cassette in an entryclone.

Three gRNA have been designed, which targeted the riceProtoporphyrinogen Oxidase (PPO) gene (WO/2015/092706; WO/2015/022640(Seq ID NO: 35), resulting in genomic double-strand cleavage at selectedtarget sites (spacer 8, spacer 9 and spacer 10 (Seq ID NO: 36, 37 and38)). Modifications aim two amino acid substitutions (L419F, F442V;single site mutations and double site mutation), which have beenpreviously identified as potential hotspots for Saflufenacil survival.

The RNA expressing cassette (including gene-specific spacer sequencesfor the selected locations in the PPO gene) containing either fusion ornon-fusion NA were synthesized and cloned into entry vectors, which wascloned (via Gateway) into the destination vector containing the CAS9expression cassette. RNA expression of gRNA and donor is driven by polIII type promoter of U3 snRNA.

After the LR recombination step, the resulting expression vector istransformed into Agrobacterium strain LBA4044 according to methods wellknown in the art.

Example 10b Constructs for Expression in Rice

An identical vector as described in Example 10a was synthesized with theexception that the NLS derived from SV40 was replaced with plant nuclearlocalization signals (NLS) (MSERKRREKL, SEQ ID NO: 71) at the N-terminalend and importin NLS (KRPAATKKAGQAKKKK SEQ ID NO: 72) at the C-terminalend and the promoter driving the RNA expression of gRNA and donor wasrice pol III type promoter of U3 snRNA (SEQ ID NO: 73).

The RNA expressing cassette (including gene-specific spacer sequencesfor the selected locations in the PPO gene) containing either fusion ornon-fusion NA were synthesized and cloned into entry vectors, which wascloned (via Gateway) into the destination vector containing the CAS9expression cassette.

The vector used as non fusion control contains PRO0231::U3 RNA pol IIIpromoter::spacer::sgRNA scaffold::TTTTTTTT terminator::U3 RNA pol IIIpromoter::template::TTTTTTTT terminator.

After the LR recombination step, the resulting expression vector istransformed into Agrobacterium strain LBA4044 according to methods wellknown in the art.

Example 11 Rice Transformation and Selection of Herbicide-Tolerant Calli

The Agrobacterium containing the expression vector is used to transformscutellum-derived callus of indica rice (Oryza sativa L.). Sterilizationof mature seeds has been carried out by incubating for one minute in 70%ethanol, followed by 40 minutes in 6% sodium hypochlorite, followed by a3 to 5 times wash with sterile MQ water. The sterilized seeds are thengerminated on a medium containing 2,4-D (callus induction medium). After6 days of incubation in the light, scutellum-derived calli are incubatedfor 90 seconds in bacterial solution (OD₆₀₀=0.1), drained, dried onsterile filter paper and then co-cultured with bacteria for 3 days inthe dark at 25° C. The co-cultivated calli are transferred to selectionmedium containing G418 for 4 weeks in the light at 32° C.Antibiotic-resistant callus pieces are transferred to selection mediumcontaining 25 or 50 μM saflufenacil (Kixor™) for 2 weeks in the light at32° C. These herbicide selection conditions have been establishedthrough the analysis of tissue survival in kill curves withsaflufenacil. After transfer of herbicide-resistant material to aregeneration medium and incubation in the light, the embryogenicpotential is released and shoots developed in the next four to fiveweeks. Shoots are excised from the calli and incubated for 2 to 3 weekson an auxin-containing medium until shoots are well rooted for transferto soil. Hardened shoots are grown under high humidity and short days ina greenhouse.

Example 12 Molecular Characterization of the Herbicide TolerantTransformants

Leaf tissue collected from each individual plant transformant is usedfor copy number analysis and molecular characterization of PPO genesequence mutations. Genomic DNA is extracted using a Wizard 96 MagneticDNA Plant System kit (Promega, U.S. Pat. Nos. 6,027,945 & 6,368,800) asdirected by the manufacturer. Isolated DNA was PCR amplified using theappropriate probe, together with forward and reverse primers. Followingthis quantitative PCR analysis to verify copy number of the T-DNAinsert, only low copy transgenic plants that exhibit tolerance to theselection agent are kept for harvest of T1 seeds. Seeds are thenharvested three to five months after transplanting. PCR amplification ofPPO genomic sequences is performed using Fusion Taq DNA Polymerase(Thermo Scientific) using thermocycling program as follows: 96° C. for15 min, followed by 35 cycles (96° C., 30 sec; 58° C., 30 sec; 72° C., 3min and 30 sec), 10 min at 72° C. PCR products are verified forconcentration and fragment size via agarose gel electrophoresis, andsend for sequencing using the PCR primers. Sequence analysis isperformed on the representative chromatogram trace files andcorresponding AlignX alignment with default settings and edited to callsecondary peaks.

Mutations identified in several individuals, based on sequenceinformation, show that the technology described in this invention, whichinvolves fusion of NA to the CRISPR components, is applicable to plantorganisms. Homologous recombination repair with the provided donorsconfers tolerance to Saflufenacil (single site mutation and multiplesite mutation).

Example 13 Controlled Gene Knockout in Escherichia coli

In this example FusionCRISPR is being used to knockout target gene RecAin E. coli strain K-12 substr. MG1655. The bacterial strain isinoculated in 10 ml SOB in a 100 ml Erlenmeyer flask and grown overnightat 37° C. (SOB: 2% bacto-tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5mM KCl, 10 mM MgCl₂ 10 mM MgSO₄). 3 ml of the overnight culture isdiluted in 250 ml SOB in a 1 liter Erlenmeyer flask and grown at 18° C.with vigorous shaking (200-250 rpm) until the OD_(660 nm) is 0.6.Subsequently the culture is transferred to precooled 50 ml tubes andcentrifuge at 5000 rpm for 5 min at 4° C. The pellet is resuspend in ⅓of the original volume of ice-cold TB (TB: 250 mM KCl, 10 mM PIPES freeacid, 15 mM CaCl₂·2H₂O, 55 mM MnCl₂·2H₂O) and incubated on ice for 10min. The cells are centrifuged at 5000 rpm for 5 min at 4° C. and thepellet resuspended in 1/12 of the original volume of ice-cold TB. DMSOis added with gentle mixing to a final concentration of 7%. Thecompetent cells are alliquoted in 200 μl portions and freezed in liquidnitrogen. One aliquot of competent cells is added together with 0.1-0.5μg of plasmid containing a chloramphenicol selectable marker and Cas9expression cassette as present in pCas9 [Jiang W, Bikard D, Cox D, ZhangF, Marraffini L (2013) RNA-guided editing of bacterial genomes usingCRISPR-Cas systems. Nat Biotechnol] and a cassette for expression offusion RNA [Zhao D, Yuan S, Xiong B, Sun H, Ye L, Li J, Zhang X, Bi C.(2016) Development of a fast and easy method for Escherichia coli genomeediting with CRISPR/Cas9. Microb Cell Fact. 15(1):205] with thefollowing FusionCRISPR sequence and RecA spacer:

(SEQ ID NO: 42) gatgtggaaaccatctctacGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGT-TATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTT CCATGGATGTGGAAAC-CATCGCTTTCACTGGATATCGCGin which the spacer recognizing RecA is in lowercase

(i.e., gatgtggaaaccatctctac),the essential sequences for the sgRNA is capitalized, not underlined,homology arm 1 with RecA is double underlined and homology arm 2 withRecA is single underlined. Promoters and terminators for theFusionCRISPR construct and Cas9 can be chosen from world wide web atparts.igem.org/Promoters/Catalog/Constitutive and world wide web atparts.igem.org/Terminators/Catalog. The targeted RecA gene has thefollowing sequence:

(SEQ ID NO: 43) ATGGCTATCGACGAAAACAAACAGAAAGCGTTGGCGGCAGCACTGGGCCAGATTGA-GAAACAATTTGGTAAAGGCTCCATCATGCGCCTGGGTGAAGACCGTTCCATGGATGTG-GAAACCAT CTCTAC

TT CGCTTTCACTGGAT ATCGCGCTTGGGG-CAGGTGGTCTGCCGATGGGCCGTATCGTCGAAATCTACGGACCGGAATCTTCCGG-TAAAACCACGCTGACGCTGCAGGTGATCGCCGCAGCGCAGCGTGAAGGTAAAAC-CTGTGCGTTTATCGATGCTGAACACGCGCTGGACCCAATCTACGCAC-GTAAACTGGGCGTCGATATCGACAACCTGCTGTGCTCCCAGCCGGACACCGGCGAG-CAGGCACTGGAAATCTGTGACGCCCTGGCGCGTTCTGGCGCAGTAGAC-GTTATCGTCGTTGACTCCGTGGCGGCACTGACGCCGAAAGCG-GAAATCGAAGGCGAAATCGGCGACTCTCACATGGGCCTTGCGGCAC-GTATGATGAGCCAGGCGATGCGTAAGCTGGCGGGTAACCTGAAGCAGTCCAACAC-GCTGCTGATCTTCATCAACCAGATCCGTATGAAAATTGGTGTGATGTTCGGTAACCCG-GAAACCACTACCGGTGGTAACGCGCTGAAATTCTAC-GCCTCTGTTCGTCTCGACATCCGTCGTATCGGCGCGGTGAAAGAGGGCGAAAAC-GTGGTGGGTAGCGAAACCCGCGTGAAAGTGGTGAA-GAACAAAATCGCTGCGCCGTTTAAACAGGCTGAATTCCAGATCCTCTAC-GGCGAAGGTATCAACTTCTACGGCGAACTGGTTGACCTGGGCGTAAAAGAGAA-GCTGATCGAGAAAGCAGGCGCGTGGTACAGCTACAAAGGTGAGAAGATCGGTCAGGG-TAAAGCGAATGCGACTGCCTGGCTGAAAGATAACCCGGAAACCGCGAAAGAGATCGA-GAAGAAAGTACGTGAGTTGCTGCTGAGCAACCCGAACTCAACGCCGGATTTCTCTG-TAGATGATAGCGAAGGCGTAGCAGAAACTAACGAAGATTTTTAAin which the PAM sequence is in italics, homology arm 1 is doubleunderlined, homology arm 2 is single underlined, and the protospacer isthe following portion of the sequence:

GATGTGGAAACCAT CTCTAC.DNA and cells are kept on ice for 30 minutes prior to a 90 seconds heatshock at 42° C. Cells and DNA are transferred to ice and 1 ml LB isadded after 1 minute (LB: 1% tryptone, 1% NaCl, 0.5% yeast extract, pH7.0). Cells are allowed to recover for 1 hour at 37° C. The recoveryphase can be extended to 16 hours to allow the FusionCRISPR componentsmore time to edit the E. coli genome. 25 μg/ml chloramphenicol should beadded after 1 hour to prevent loss of the plasmid. Cells are plated onLB medium with 25 μg/ml chloramphenicol and incubated at 37° C. for 1day. Single colonies are selected from plate and grown overnight in LBwith chloramphenicol at 37° C. after which genomic DNA is extracted [He,F. (2011) E. coli Genomic DNA Extraction. Bio-protocol Bio101: e97]. PCRwith a forward primer upstream from the first homology arm(ATGGCTATCGACGAAAACAAA) (SEQ ID NO: 44) and reverse primer downstreamfrom the second homology arm (CGTCAGCGTGGTTTTACCGGA) (SEQ ID NO: 45) isperformed to identify colonies in which the 11 nucleotides shown in boldin the RecA sequence (SEQ ID 43) are no longer present due to homologousrecombination repair with the FusionCRISPR template. PCR fragments canbe sequenced (expected size 220 bp) or, in this case, subjected to AgeIdigestions (the deleted sequence around PAM contains the AgeIrecognition site ACCGGT) to verify modification of the locus afterstandard gel electrophoresis. Deletion of 11 nucleotides ensures adisruption of the open reading frame.

Example 14 Controlled Knockout of the PRDM9 Gene in Human-InducedPluripotent Stem Cells (hiPSCs) and HEK293 Cells

Cell culture maintenance, plasmid construction, transfection methods andmolecular analysis of genome editing in hiPSCs and HEK293 cells aredescribed in great detail in Yang L, Yang J L, Byrne S, Pan J, Church G(2014) CRISPR/Cas9-directed genome editing of cultured cells. CurrentProtocols in Molecular Biology 31.1.1-31.1.17. For knockout of the PRDM9gene, all steps are followed as described therein, with only a minorchange in the gRNA plasmid design. The synthesized gRNA should have thefollowing sequence:

(SEQ ID NO: 46) TGTACAAAAAAGCAGGCTTTAAAGGAACCAATTCAGTCGACTGGATCCGGTAC-CAAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATAC-GATACAAGGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAG-TACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAG-TTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTT-GAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCgg-catccctcaggctgggctGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAG-TCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGGCCATCAGGCATCCCTCAG-TATGGAA TGAGGCATCTGATttttttin which the U6 promoter is shown in italics, the spacer recognizingPRDM9 exon ENSE00001804383 is in lowercase (i.e., ggcatccctcaggctgggct),the essential sequences for the sgRNA is capitalized not underlined orin italics, homology arm 1 with PRDM9 is double underlined, homology arm2 with PRDM9 is single underlined and the terminator is in small case.The targeted PRDM9 exon ENSE00001804383 has the sequence:

(SEQ ID NO: 47) ATTGTGAGATGTGTCAGAACTTCTTCATT-GACAGCTGTGCTGCCCATGGGCCCCCTACATTTGTAAAGGACAGTGCAGTG-GACAAGGGGCACCCCAACCGTTCAGCCCTCAGTCTGCCCCCAGGGCTGAGAATT-GGGCCATCAGGCA TCCCTCAG GCTGGGCT 

AG-TATGGAATGAGGCATCTGATCTGCCGCTGGGTCTGCACTTT-GGCCCTTATGAGGGCCGAATTACAGAAGACGAAGAGGCAGCCAACAATGGA-TACTCCTGGCTGTGGin which the PAM sequence is in italics, homology arm 1 is doubleunderlined, homology arm 2 is single underlined, and the protospacer isthe following portion of the sequence:

GGCATCCCTCAG GCTGGGCT.The nucleotides shown in boldface are deleted upon homologousrecombination with the FusionCRISPR construct resulting in a frame shiftas shown using PCR amplifying the respective genomic region from genomicDNA and subsequent sequencing of the resulting PCR products.

Example 15: Introduction of Point Mutations in Rice Plants Leading toCyclohexanedione (DIM) and/or Aryloxyphenoxypropionate (FOP) in Rice

Mutations I1781L and G2096S in plastidic Acetyl Coenzyme A Carboxylase(ACCase) are known to confer tolerance to DIM and FOP herbicides. Thesemutations can be introduced at the endogenous ACCase locus using thefollowing vectors.

Vector RLW137 SEQ ID NO: 66

The backbone of this vector is the gateway-enabled construct RLW121 SEQID NO: 62. ENTR vectors for RLW137 are vectors CC003 SEQ ID NO: 63(selectable marker for the incoming T-DNA), CC018 SEQ ID NO: 64(producing the FusionCRISPR construct which targets and introducesG2096S after cutting upstream from the DNA that corresponds to G2096)and CC006 SEQ ID NO: 65 (providing Cas9).

CC018 (short for CRISPRCas018) contains ˜300 nt homology arms flankingthe incoming nucleotides (in this case encoding G2096S). Additionalmutations are co-introduced to avoid self-cleavage of the T-DNA (mutatedPAM, alternatively or in addition the spacer could include manymutations which are preferably silent in parts that correspond withexons and do not affect intron/exon borders if present) and earlytermination of transcription on long stretches of T present either inthe homology arms or incoming nucleotides.

A control vector is synthesized which is identical except that the donormolecule is expressed as separate molecule which is not linked to theguide RNA.

Vector RLW137 and the control vector are transformed into rice using theprotocol described above. Initial selection is for the presence of theZmAHAS A122T S553N marker. Analysis of the transformed plants isperformed as described in example 12.

Similar to the procedure described above for RLW137, RLW138 introducesthe same mutation, but this time using an alternative, downstreamprotospacer site. RLW138 consists of the RLW121 backbone with the CC003,CC019 (SEQ ID NO: 67) and CC006. The mutation 11781L is introduced byRLW139 (SEQ ID NO: 70) consisting of RLW121, CC003, CC020 (SEQ ID NO:69) and CC006.

Example 16 Application of Fusion CRISPR in Bacillus

Electrocompetent Bacillus subtilis Cells and Electroporation

Transformation of DNA into B. subtilis ATCC 6051 is performed viaelectroporation. Preparation of electrocompetent B. subtilis ATCC 6051cells and transformation of DNA is performed as essentially described byXue et al (Xue, G.-P., 1999, Journal of Microbiological Methods 34,183-191) with the following modification: Upon transformation of DNA,cells are recovered in 1 ml LBSPG buffer and incubated for 60 min at 37°C. (Vehmaanperä J., 1989, FEMS Microbio. Lett., 61: 165-170) followingplating on selective LB-agar plates. For plasmids containing thetemperature-sensitive pE194 replication origin, cells are recovered for3h at 33° C.

Plasmid Isolation

Plasmid DNA was isolated from Bacillus and E. coli cells by standardmolecular biology methods described in (Sambrook, J. and Russell, D. W.Molecular cloning. A laboratory manual, 3rd ed, Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y. 2001.) or the alkaline lysismethod (Birnboim, H. C., Doly, J. (1979). Nucleic Acids Res 7(6):1513-1523). Bacillus cells were in comparison to E. coli treated with 10mg/ml lysozyme for 30 min at 37° C. prior to cell lysis.

Annealing of oligonucleotides to form oligonucleotide-duplexes.

Oligonucleotides were adjusted to a concentration of 100 μM in water. 5μl of the forward and 5 μl of the corresponding reverse oligonucleotidewere added to 90 μl 30 mM Hepes-buffer (pH 7.8). The reaction mixturewas heated to 95° C. for 5 min following annealing by ramping from 95°C. to 4° C. with decreasing the temperature by 0.1° C./sec (Cobb, R. E.,Wang, Y., & Zhao, H. (2015). High-Efficiency Multiplex Genome Editing ofStreptomyces Species Using an Engineered CRISPR/Cas System. ACSSynthetic Biology, 4(6), 723-728).

Molecular Biology Methods and Techniques

Plasmid pJOE8999:

Altenbuchner J. 2016. Editing of the Bacillus subtilis genome by theCRISPR-Cas9 system. Appl Environ Microbiol 82:5421-5.

Plasmid pCC001

The pJOE8999 and the synthetic gene fragment Seq ID 048 provided in astandard E. coli cloning vector (pUC derivative) are cut with AvrII andXbaI following isolation of the pJOE8999 plasmid backbone and thesmaller AvrII/XbaI fragment of Seq ID 048. The two fragments are ligatedusing with T4-DNA ligase (NEB) following transformation into E. coliXL1-Blue competent cells (Stratagene). The correct plasmid was recoveredand named pCC001.

Plasmid pCC002

The pJOE8999 and the synthetic gene fragment Seq ID 049 provided in astandard E. coli cloning vector (pUC derivative) are cut with AvrII andXbaI following isolation of the pJOE8999 plasmid backbone and thesmaller AvrII/XbaI fragment of Seq ID 049. The two fragments are ligatedusing with T4-DNA ligase (NEB) following transformation into E. coliXL1-Blue competent cells (Stratagene). The correct plasmid was recoveredand named pCC002.

Plasmid pCC003

The oligonucleotides SeqID 050 and Seq ID 051 with 5′ phosphorylationare annealed to form an oligonucleotide duplex encoding for theprotospacer sequence targeting the amylase gene amyE of B. subtilisATCC6051. The plasmid pJOE8999 is cut with BsaI following ligation ofthe oligonucleotide duplex to recover plasmid pCC003.

Plasmid pCC004

The 5′homology region (also referred to as HomA) and the 3′ homologyregion (also referred to as HomB) adjacent to the amylase amyE gene ofB. subtilis ATCC6051 were PCR-amplified on isolated genomic DNA witholigonucleotides Seq ID NO: 52, Seq ID NO: 53 and SeqID NO: 54, Seq IDNO: 55 respectively. The two homology regions HomA and HomB were fusedand amplified using overlap PCR with oligonucleotides Seq ID NO: 52 andSeq ID NO: 55 to recover the HomAB PCR fragment of the homology regionsof the amyE gene. The plasmid pCC003 and the HomAB-amyE PCR fragmentwere cut with SfiI following ligation with T4-DNA ligase (NEB). Thereaction mixture was transformed into E. coli XL1-Blue competent cells(Stratagene). The correct plasmid containing the amyE protospacer andthe HomAB of amyE was recovered and named pCC004 (FIG. 23 ).

Plasmid pCC005

The plasmid pCC001 was cut with BsaI following cloning of the amyEprotospacer oligonucleotide duplex (SeqID 050/Seq ID051) as describedfor pCC003. The resulting plasmid and the PCR-fragment of the homologyregions HomAB of the amyE gene as described for construction of pCC004were cut with SfiI following ligation with T4-DNA ligase (NEB). Thereaction mixture was transformed into E. coli XL1-Blue competent cells(Stratagene). The correct plasmid containing the amyE protospacer andthe HomAB of amyE was recovered and named pCC005.

Plasmid pCC006

The 5′ homology region (also referred to as HomA) and the 3′ homologyregion (also referred to as HomB) adjacent to the amylase amyE gene ofB. subtilis ATCC6051 were PCR-amplified on isolated genomic DNA witholigonucleotides Seq ID NO: 56, Seq ID NO: 57 and SeqID NO: 58, Seq IDNO: 59 respectively. The two homology regions HomA and HomB were fusedand amplified using overlap PCR with oligonucleotides Seq ID NO: 56 andSeq ID NO: 59 to recover the HomAB PCR fragment of the homology regionsof the amyE gene. The plasmid pCC001 was cut with BsaI followingligation of the amyE protospacer oligonucleotide duplex (SeqID NO:50/Seq ID NO: 51) with T4-DNA ligase (NEB) as described for pCC003. Theresulting plasmid and the PCR-fragment of the homology regions HomAB ofthe amyE gene were cut with SfiI following ligation with T4-DNA ligase(NEB). The reaction mixture was transformed into E. coli XL1-Bluecompetent cells (Stratagene). The correct plasmid containing the amyEprotospacer and the HomAB of amyE in reverse orientation compared topCC005 was recovered and named pCC006.

Plasmid pCC007

The plasmid pCC002 was cut with BsaI following ligation of the amyEprotospacer oligonucleotide duplex (SeqID NO: 50/Seq ID NO: 51) withT4-DNA ligase (NEB) as described for pCC003. The resulting plasmid andthe PCR-fragment of the homology regions HomAB of the amyE gene asdescribed for construction of pCC004 were cut with SfiI followingligation with T4-DNA ligase (NEB). The reaction mixture was transformedinto E. coli XL1-Blue competent cells (Stratagene). The correct plasmidcontaining the the HomAB of the amyE gene and the amyE protospacer wasrecovered and named pCC007.

Plasmid pCC008

The plasmid pCC002 was cut with BsaI following ligation of the amyEprotospacer oligonucleotide duplex (SeqID NO: 50/Seq ID NO: 51) withT4-DNA ligase (NEB) as described for pCC003. The resulting plasmid andthe PCR-fragment of the homology regions HomAB of the amyE amplifiedwith oligonucleotides Seq ID NO: 56 and Seq ID NO: 59 as described forpCC006 were cut with SfiI following ligation with T4-DNA ligase (NEB).The reaction mixture was transformed into E. coli XL1-Blue competentcells (Stratagene). The correct plasmid containing the HomAB of amyE inreverse orientation compared to pCC007 and the amyE protospacer wasrecovered and named pCC008.

Gene Deletion Using Fusion-CRISPR.

Electrocompetent B. subtilis ATCC6051 cells were transformed with 1 μgeach of plasmids pJOE8999, pCC004, pCC005, pCC006, pCC007, pCC008 asessentially described by Xue et al (Xue, G.-P., 1999, Journal ofMicrobiological Methods 34, 183-191) with the following modification:Upon transformation of DNA, cells were recovered in 1 ml LBSPG bufferand incubated for 3h at 33° C. (Vehmaanperä J., 1989, FEMS Microbio.Lett., 61: 165-170) following plating on LB-Lennox plates supplementedwith 20 μg/ml kanamycin and 0.2% D-Mannose for Cas9 induction. Plateswere incubated for 20-22h at 33° C. Up to 10 clones from each plasmidtransformation were picked and transferred onto a fresh preheatedLB-Lennox-plate following incubation at 50° C. for 18h. From each largegrown colony, cells were picked and 3 strokes on fresh LB-Lennox platesperformed to yield single colonies after 7-8h incubation at 45° C.Single colonies were transferred onto LB-Lennox plates and LB-Lennoxplates supplemented with 20 μg/ml kanamycin, following incubation for16-18 hours at 30° C. Kanamycin-sensitive clones, indicative of plasmidloss, were plated on LB-Lennox plates supplemented with 1% solublestarch following incubation for 20 hours at 30° C. Inactivation of theamylase amyE gene was visualized by covering the plates with iodinecontaining Lugols solution and analyzed for the presence or absence of alight halo, the latter indicating a successful inactivation.

Table 2 summarizes the amount of total clones after plasmid curing,amount of clones with inactivated amylase, the percentage of clones withinactivated amylase relative to total clones and the relative knockoutefficiency with the indicated plasmids relative to pCC004 (FIG. 25 )

TABLE 2 Subclones Subclones Subclones Relative to Construct total Amy.neg. Amy. neg. [%] pCC004 pJOE8999 90 0 0 0 pCC004 177 42 24 100 pCC005197 113 57 242 pCC006 192 79 41 173 pCC007 117 95 81 342 pCC008 146 11679 335

1. A method for modification of a target nucleic acid (target NA)molecule in a cell comprising the steps of: a. providing a recombinantfusion nucleic acid (fuNA) molecule comprising a guide nucleic acid(gNA) molecule covalently linked to at least one donor nucleic acid(doNA) molecule, and b. introducing said fuNA molecule into one or morecell(s) comprising the target NA molecule, and c. introducing a sitedirected nucleic acid modifying polypeptide into said one or morecell(s), and d. incubating the one or more cells under conditions thatallow for homologous recombination in said one or more cell(s), whereinthe doNA comprises two homology arms and wherein said two homology armsare directly adjacent to each other.
 2. The method of claim 1, furthercomprising isolating one or more cells in which homologous recombinationoccurred.
 3. The method of claim 1, wherein, the doNA molecule and gNAmolecule consist of RNA, DNA, or PNA.
 4. The method of claim 1, whereinthe doNA molecule consists of DNA and the gNA molecule consists of RNA,or wherein the fuNA consists of RNA.
 5. The method of claim 1, whereineach homology arm independently comprises at least 15 basescomplementary to a different area of at least 15 consecutive bases ofthe target NA molecule from the other homology arm, wherein the homologyarms each comprise at least 15 bases that are 100% complementary to thesame number of consecutive bases in the target NA molecule.
 6. Themethod of claim 1, wherein one or both of the homology arms comprisemore than 15 bases, and wherein the homology arm(s) comprising more than15 bases is/are at least 60%, at least 70%, at least 75%, at least 80%,at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%complementary to an area of the target NA molecule.
 7. The method ofclaim 1, wherein the gNA comprises a spacer NA and a scaffold NAmolecule, wherein the scaffold NA forms a secondary structure comprisingat least one hairpin, and wherein the spacer NA comprises at least 18bases 100% complementary to the same number of consecutive bases of thetarget NA molecule.
 8. The method of claim 1, wherein the cell is amicrobial, animal, human or plant cell.
 9. The method of claim 1,wherein the site directed nucleic acid modifying polypeptide has adouble-stranded nucleic acid digestion function, or a nickase function,cutting only one strand of a double-stranded nucleic acid molecule. 10.The method of claim 1, wherein the fuNA molecule is a fuRNA molecule andis introduced into the cell or composition comprising the target NAmolecule as an RNA molecule or as one or more expression constructsencoding said fuRNA molecule.
 11. A recombinant fuNA molecule comprisinga doNA molecule covalently linked to a gNA molecule, wherein the doNAcomprises two homology arms each independently comprising at least 15bases complementary to a different area of at least 15 consecutive basesof the target NA molecule from the other homology arm, and wherein saidtwo homology arms are directly adjacent to each other.
 12. Therecombinant fuNA molecule of claim 11, wherein the doNA moleculeconsists of DNA and the guide NA molecule consists of RNA, or whereinthe fuNA consists of RNA.
 13. The recombinant fuNA molecule of claim 11,wherein the homology arms each comprise at least 15 bases are 100%complementary to the same number of consecutive bases in the target NAmolecule, and wherein, in the event a homology arm is larger than 15bases, it is at least 60%, at least 70%, at least 75%, at least 80%, atleast 85%, at least 90%, at least 95%, at least 98%, or at least 99%complementary to the target NA molecule.
 14. The recombinant fuNAmolecule of claim 11, wherein the gNA comprises a spacer NA and ascaffold NA molecule, wherein the scaffold NA forms a secondarystructure comprising at least one hairpin, and wherein the spacer NAcomprises at least 18 bases 100% complementary to the same number ofconsecutive bases of the target NA molecule.
 15. A recombinant fusionnucleic acid (fuNA) molecule comprising a guide nucleic acid (gNA)molecule covalently linked to a donor nucleic acid (doNA) molecule,wherein said fuNA molecule consists of RNA, wherein said doNA moleculecomprises two homology arms, wherein each of said two homology armsindependently comprises at least 15 bases complementary to a differentarea of at least 15 consecutive bases of the target NA molecule from theother homology arm, wherein at least 15 bases are 100% complementary tothe same number of consecutive bases in the target NA molecule, whereinboth homology arms are separated by at least one or more bases orwherein both homology arms are directly adjacent to each other, andwherein both homology arms have the same length or different lengths.16. The recombinant fuNA molecule of claim 15, wherein the gNA moleculecomprises a spacer nucleic acid (spacer NA) molecule and a scaffoldnucleic acid (scaffold NA) molecule, wherein said spacer NA moleculecomprises at least 12 bases, at least 13 bases, at least 14 bases, atleast 15 bases, at least 16 bases, at least 17 bases, at least 18 bases,at least 19 bases, or at least 20 bases which are 100% complementary tothe target NA molecule, and wherein said scaffold NA forms a secondarystructure comprising at least one hairpin, particularly preferably atleast two hairpins.
 17. A vector comprising an expression constructcomprising a promoter functionally linked to a DNA molecule encoding thefuNA molecule of claim
 11. 18. A vector system comprising the vector ofclaim 17 and a vector encoding a site directed nucleic acid modifyingpolypeptide.
 19. A system for modification of a target NA molecule in acell comprising a. the vector of claim 17, b. a vector encoding a sitedirected nucleic acid modifying polypeptide, and c. a cell comprising atarget NA molecule.
 20. A composition comprising a. the vector of claim17, b. a vector encoding a site directed nucleic acid modifyingpolypeptide, and c. a cell comprising a target NA molecule.